PHP - Domdocument -> Unable To Pull Results From Href
I have been able to pull the html of a given node
Similar Tutorialsgood evening dear PHPFreaks - hello to everybody. i want to create a link parser. i have choosen to do it with Curl. I have some lines together now. Love to hear your review... Since i am new to programming i love to get some hints from experienced devs. Here some details: well since we have several hundred of resultpages derived from this one: http://www.educa.ch/dyn/79362.asp?action=search Note: i want to itterate over the resultpages - with a loop. http://www.educa.ch/dyn/79376.asp?id=1568 http://www.educa.ch/dyn/79376.asp?id=2149 i take this loop: for($i=1;$i<=$match[1];$i++) { $url = "http://www.example.com/page?page={$i}"; // access new sub-page, extract necessary data } Dear PHP-Freaks, what do you think? What about the Loop over the target-Urls? BTW: you see - there will be some pages empty. Note - the empty pages should be thrown away. I do not want to store "empty" stuff. well this is what i want to. And now i need to have a good parser-script. Note: this is a tree-part-job: 1. fetching the sub-pages 2. parsing them 3. storing the data in a mysql-db Well - the problem - some of the above mentioned pages are empty. so i need to find a solution to leave them aside - unless i do not want to populate my mysql-db with too much infos.. Btw- parsing should be a part that can be done with DomDocument - What do you think? I need to combine the first part with tthe second - can you give me some starting points and hints to get this. The fetching-job should be done with CuRL - and to process the data into a DomDocument-Parser-Job. note ive taken the script from this place: http://www.merchantos.com/makebeta/php/scraping-links-with-php/ function storeLink($url,$gathered_from) { $query = "INSERT INTO links (url, gathered_from) VALUES ('$url', '$gathered_from')"; mysql_query($query) or die('Error, insert query failed'); } for($i=1;$i<= 10000; $i++) { $target_url = "http://www.educa.ch/dyn/79376.asp?id={$i}"; } // access new sub-page, extract necessary data $userAgent = 'Googlebot/2.1 (http://www.googlebot.com/bot.html)'; // make the cURL request to $target_url $ch = curl_init(); curl_setopt($ch, CURLOPT_USERAGENT, $userAgent); curl_setopt($ch, CURLOPT_URL,$target_url); curl_setopt($ch, CURLOPT_FAILONERROR, true); curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true); curl_setopt($ch, CURLOPT_AUTOREFERER, true); curl_setopt($ch, CURLOPT_RETURNTRANSFER,true); curl_setopt($ch, CURLOPT_TIMEOUT, 10); $html= curl_exec($ch); if (!$html) { echo "<br />cURL error number:" .curl_errno($ch); echo "<br />cURL error:" . curl_error($ch); exit; } // parse the html into a DOMDocument $dom = new DOMDocument(); @$dom->loadHTML($html); // grab all the on the page $xpath = new DOMXPath($dom); $hrefs = $xpath->evaluate("/html/body//a"); for ($i = 0; $i < $hrefs->length; $i++) { $href = $hrefs->item($i); $url = $href->getAttribute('href'); storeLink($url,$target_url); echo "<br />Link stored: $url"; } Dear PHP-Freaks, what do you think? What about the Loop over the target-Urls? love to hear from you ! I am trying to implement lazy loading into a project with pagination already set up.
If I was to go to my website and after the .com, type in "api_courselist.php?page=1" I would receive the first 20 results of my query. If I was to change that to page=2, it would retrieve the next 20 and so on.
My issue is I have no idea how to implement that into my java script/ajax file. I have it set up so that when the user scrolls to the bottom of the page, a div will make its self visible with the new page populated inside of it and IT will keep on pulling pages in till There is no more content.
jQuery(function ($) { jQuery(document).ready(function() { var is_loaded = true; jQuery(window).scroll(function() { if(jQuery(window).scrollTop() == jQuery(document).height() - jQuery(window).height()) { jQuery('div#loadMore').show(); if(is_loaded){ is_loaded = false; jQuery.ajax({ url: "api_courselist.php?page=1", success: function(html) { is_loaded = true; if(html){ jQuery("#infiscroll").append(html); jQuery('div#loadMore').hide(); }else{ jQuery('div#loadMore').replaceWith("<center><h1 style='color:red'>End of Content !!!!!!!</h1></center>"); } } }); } } }); }); }); First thanks for any help that you can provide. I am somewhat new to PHP and this is the first "Web Service" I have had to create. THE GOAL: I need to pull XML data from another server. This company's API is setup so that you must give an IP and so you can only pull data from server to server instead of client to server. The data is pulled from the API using HTTP requests...very similar to YQL. (in essence the structured query is located in the URL). This API also requires that my server only ping's their server every 10-15 mins, in order to cut down on server requests. The logical thought in my head was to setup: a cron job to run a PHP script every 10 mins. The PHP scripts would then do the following: 1. Make the HTTP request 2. Open an existing file or create one (on my server) 3. Take the returned XML data from the API and write to the newly opened file. 4. Convert that XML to JSON. 5. Save the JSON 6. Cache the JSON file 7. STOP My thought was to use curl and fopen for the first 3 steps. I found a basic script that will do this on PHP.net (as shown below). After that I am pretty much lost on how to proceed. Code: [Select] <?php $ch = curl_init("http://www.example.com/"); $fp = fopen("example_homepage.txt", "w"); curl_setopt($ch, CURLOPT_FILE, $fp); curl_setopt($ch, CURLOPT_HEADER, 0); curl_exec($ch); curl_close($ch); fclose($fp); ?> I would REALLY appreciate any help on this. Also, if you have the time to please comment and explain what you are doing in any code examples and why....that would HELP out a lot. I truly want to learn not just grab a snippet and run. So, your comments are vital for me. thank!!! Delete ... need to change host .. i have an admin page that lists all the rows from a table. One of the fields is generated using FCKEditor, so the field has HTML Code in it. I would like to only show the first 100 or so chars of the field, but if the 100 chars ends in the middle of HTML code, specifically in this case <a href>, it cuts off, and creates "havoc" in my page. Is there a way to avoid this by seeing that the 100 char is in the middle of an <a href> tag (or any tag for that matter? Here is what the generated html can look like :: <td class="tdresults">We are very excited to offer three great events at the <a href="http://www.mylongwebsite.c</td> <td class="tdresults">next field data</td> Thank you in advance. Pete Hi guys; I've managed to find my way into something of a maze. <td> <a href="'Details/21/index.php'"?id= . $id .'>" . $row['id'] . "</a> </td> I'm looking at this line in the code. I realise it's currently in the wrong syntax but I'm just trying multiple different variations. At the moment I'm actually just trying to make it go to a static link but my real goal is to - get it to pick up the id number, of the id row I click, and concatenate that with the rest of my address string. Then use that as the href. Something like this: - "'Details/' + $id + '/index.php'" It's really confusing me though inside this loop and for some reason the id number is being picked up as an int, by the looks of things. It's getting above my level of understanding. Any chance one of you masters would through a n00b a lifeline?
This is my code below
<!DOCTYPE html> <html> <head> <title>LifeSaver DB</title> <h1> LifeSaver Database </h1> </head> <body> <table> <tr> <th>Id</th> <th>Location</th> <th>Initials</th> <th>TimeStamp</th> <th>Notes</th> </tr> <?php $conn = mysqli_connect("localhost", "meh", "pas", "DB"); // Check connection if ($conn->connect_error) { die("Connection failed: " . $conn->connect_error); } $sql = "SELECT * FROM LifeSaver1 ORDER BY id DESC"; $result = $conn->query($sql); if ($result->num_rows > 0) { // output data of each row while($row = $result->fetch_assoc()) { //for href row $id = $row['id']; $Footage = ['Footage']; echo "<tr> <td> <a href="'Details/21/index.php'"?id= . $id .'>" . $row['id'] . "</a> </td> <td>" . $row["Location"] . "</td> <td>" . $row["Initials"]. "</td> <td>" . $row["TimeStamp"]. "</td> <td>" . $row["Notes"] . "</td> </tr>";} //show table echo "</table>"; } else { echo "0 results"; } $conn->close(); ?> </table> </body> <style> table, td, th { border: 1px solid black; margin: auto; } table { border-collapse: collapse; color: #000; <!--font colour --> font-family: monospace; font-size: 18px; text-align: center;} th { background-color: #337AFF; color: white; font-weight: bold; } tr:nth-child(odd) {background-color: #add8e6} </style> </html> Edited February 8, 2020 by JonnyDriller Hi all, I am pretty new to php and I am having an issue trying to load an XML document. When ever I try to use Xpath it negates all the code below the line, including the HTML, and returns a white page. here is my code: Code: [Select] <html> <head> <?php $xpath = new DOMXPath("structure.xml"); ?> <body> hello world </body> </html> I checked phpinfo() and I have both the DOM and XPath enables and installed. I have also tried using just DOM and that worked so it is only Xpath that is not working. Ideas? Thank you James S I have some code $doc = new DOMDocument(); $doc->loadHTML( '<html> <head><title>Test</title></head> <body></body></html>' ); $doc->encoding = 'iso-8859-1'; file_put_contents('test.html', $doc->saveHTML()); when i view the output file i get <html><head><title>Test</title></head><body></body></html> all on one line is there no way of having it format it like the original source code so that its not all bunched together? Is there any reason that people can think as to why DOMDocument::saveHTML would remove the following: Code: [Select] <![if !vml]> <img src="someimage.jpg" /> <![endif]> A little clarification... This HTML comment tag is used in my company's email newsletter code and is necessary to make Outlook 2007 behave properly. For whatever reason, saveHTML strips it out. I know that this doesn't conform to HTML standards and I'm guessing that that is why it is being stripped. BUT, from reading on the internet, saveHTML can produce junk html code anyways. Any help is appreciated. Hi guys, Just starting to play with PHP Domdocument, only to fail at the very first step: <?php $html = 'test/php/somefile.html' ; if(!empty($html)){ $dom_1 = new domDocument ; $dom_1->loadHTML($html) ; $links = $dom_1->getElementsByTagName('li') ; foreach ( $links as $link) { // echo $link ; echo $link->nodeValue, PHP_EOL; } } ?> When I visit it in a browser I get a WSOD, what am I missing? Hi, I have a php code that could extract the categories and display them. However, I still can't extract the numbers that goes along with it too(without the bracket).
This is my code:
<?php $grep = new DoMDocument(); @$grep->loadHTMLFile("http://www.lelong.com.my/Auc/List/BrowseAll.asp"); $finder = new DomXPath($grep); $class = "CatLevel1"; $nodes = $finder->query("//*[contains(@class, '$class')]"); foreach ($nodes as $node) { $span = $node->childNodes; echo $span->item(0)->nodeValue."<br>"; } ?>This is my desired output: Arts, Antiques & Collectibles : 9768 B2B & Industrial Products : 2342 Baby : 3453 etc...Any help is appreciated. Thanks! Imagine an html with the following structure Code: [Select] <div class="item"> <div class="title"> <a class="title" href="http://www.domain.com/title.html">Title is here</a> </div> <div class="image"> <a href="http://www.domain.com/title.html"><img src=image.jpg /></a> </div> </div> How to make an array containing $title - $url - $image_url ? It is easy to get image or link by DomDocument, but I did not find a way to get image with its target link. Imagine a html as Code: [Select] <div class=image> <a href='http://site.com'><img src='imagelink.jpg'></a> </div>How to get both the image link and href? $dom = new DOMDocument(); @$dom->loadHTML($html); $xpath = new DOMXPath($dom); $hrefs = $xpath->evaluate("/html/body//div[@class='image']"); for ($i = 0; $i < $hrefs->length; $i++) { $href = $hrefs->item($i); Now to get the image and its href, we need first getElementsByTagName('a') and getElementsByTagName('img') but they do not work inside foreach. What's your idea? I just finished (or so I thought) a project. But my client's server runs PHP4, so I need to adapt my code. Here's what stopped working: Code: [Select] $localClasses = new DOMDocument; $localClasses -> load("file.xml"); $localClasses -> get_elements_by_tagname('Title') -> item(0) -> firstChild -> nodeValue Here's my petty attempt at trying to adapt this code to run in PHP4: Code: [Select] $file = file_get_contents("localClasses.xml"); $localClasses = new DOMDocument($file); $test = $localClasses -> get_elements_by_tagname('Title'); $testText = $test -> item[0] -> firstChild -> nodeValue; print $testText; This doesn't give me any errors, but nothing shows up. Any help would be appreciated. Thanks for reading! Code: [Select] $domdoc=new DOMDocument(); $domdoc->formatOutput=TRUE; $empty_cart_xml= '<Order> <Cart> <Items> <Item>1</Item> <Item>2</Item> <Item>3</Item> </Items> </Cart> </Order>'; $domdoc->loadXML($empty_cart_xml); print $domdoc->saveXML()."<hr/>"; //works up to this point $xpath=new DOMXPath($domdoc); $items=$xpath->query('Order/Cart/Items'); foreach($itemses AS $items) { $items->appendChild($domdoc->createElement('Item','4')); } print $domdoc->saveXML(); All I want to do is to add a new Item to Items. What am I doing wrong? so there's some html i'm having to fetch and parse for personal use...but some of the data i want is started in a table written this way: rawTableData = {"rows": [{"colname1": value, "colname2": "value", "colname3: value} how can i use domdocument to parse this data if say i want the value for colname2? there are no tags for me to use. Hi guys, Reading this from php.net, has got me a wee bit confused. Trying to implement is has got me doubly confused! My code: $dom = new DOMDocument; libxml_use_internal_errors(true); $dom->loadHTMLFile($parent_node); if($dom->childNodes <>0) { $kids = array ( 'url' => $parent_node, 'No_of_kids' => count($dom->childNodes) ); } Results in '' Notice: Object of class DOMNodeList could not be converted to int' How the heck am i supposed to count the childNodes? hello dear friends - i want to test if the DOMdocument [class] exists? Can i do this in the shell (on OpenSuse 11.3)? bool class_exists ( string $class_name [, bool $autoload = true ] ) bool class_exists ( string $DOMdocument [, bool $autoload = true ] ) or do i have to create a file that i call itself in the shell!? look forward for an idea / hint / tipp regards db1 I am trying to take a specific link from my site and place it into my database. I only want links starts with CORPSEARCH.ENTITY_INFORMATION?p_nameid= Can someone point me in the right direction here? Code for this is below: // make the cURL request to $target_url $html= curl_exec($ch); if (!$html) { echo "<br />cURL error number:" .curl_errno($ch); echo "<br />cURL error:" . curl_error($ch); exit; } // parse the html into a DOMDocument $dom = new DOMDocument(); @$dom->loadHTML($html); // grab all the on the page $xpath = new DOMXPath($dom); $hrefs = $xpath->evaluate("/html/body//a"); for ($i = 0; $i < $hrefs->length; $i++) { $href = $hrefs->item($i); $url = $href->getAttribute('href'); $sql="INSERT INTO links(cid, nlink)VALUES('$i','$url')"; $result=mysql_query($sql); echo $result; echo $url; good day dear PHPFreaks - hello to everybody. i want to create a link parser. i have choosen to do it with Curl. I have some lines together now. Love to hear your review... Since i am new to programming i love to get some hints from experienced devs. Here some details: well since we have several hundred of resultpages derived from this one: http://www.educa.ch/dyn/79362.asp?action=search Note: i want to itterate over the resultpages - with a loop. http://www.educa.ch/dyn/79376.asp?id=1568 http://www.educa.ch/dyn/79376.asp?id=2149 i take this loop: for($i=1;$i<=$match[1];$i++) { $url = "http://www.example.com/page?page={$i}"; // access new sub-page, extract necessary data } what do you think? What about the Loop over the target-Urls? BTW: you see - there will be some pages empty. Note - the empty pages should be thrown away. I do not want to store "empty" stuff. well this is what i want to. And now i need to have a good parser-script. Note: this is a tree-part-job: 1. fetching the sub-pages 2. parsing them 3. storing the data in a mysql-db Well - the problem - some of the above mentioned pages are empty. so i need to find a solution to leave them aside - unless i do not want to populate my mysql-db with too much infos.. Btw- parsing should be a part that can be done with DomDocument - What do you think? I need to combine the first part with tthe second - can you give me some starting points and hints to get this. The fetching-job should be done with CuRL - and to process the data into a DomDocument-Parser-Job. No Problem he But how to do the DOM-Document-Job ... i have installed FireBug into the FireFox... now i have the Xpaths for the sites: http://www.educa.ch/dyn/79376.asp?id=1187 http://www.educa.ch/dyn/79376.asp?id=2939 http://www.educa.ch/dyn/79376.asp?id=1515 http://www.educa.ch/dyn/79376.asp?id=1469 Altes Schulhaus Ossingen :: /html/body/div[2] Guntibachstrasse 10 :: /html/body/div[4] 8475 Ossingen :: /html/body/div[6] sekretariat.psossingen@bluewin.ch :: /html/body/div[9]/a Tel:052 317 15 45 :: /html/body/div[11] Fax:052 317 04 42 :: /html/body/div[12] but how to appyl in the Simple DomDocument - i want to use this he http://simplehtmldom.sourceforge.net/ look forward to a hint that gives me a starting point |