PHP - Getting Image With Its Link By Domdocument
It is easy to get image or link by DomDocument, but I did not find a way to get image with its target link. Imagine a html as
Code: [Select] <div class=image> <a href='http://site.com'><img src='imagelink.jpg'></a> </div>How to get both the image link and href? $dom = new DOMDocument(); @$dom->loadHTML($html); $xpath = new DOMXPath($dom); $hrefs = $xpath->evaluate("/html/body//div[@class='image']"); for ($i = 0; $i < $hrefs->length; $i++) { $href = $hrefs->item($i); Now to get the image and its href, we need first getElementsByTagName('a') and getElementsByTagName('img') but they do not work inside foreach. What's your idea? Similar TutorialsI am trying to take a specific link from my site and place it into my database. I only want links starts with CORPSEARCH.ENTITY_INFORMATION?p_nameid= Can someone point me in the right direction here? Code for this is below: // make the cURL request to $target_url $html= curl_exec($ch); if (!$html) { echo "<br />cURL error number:" .curl_errno($ch); echo "<br />cURL error:" . curl_error($ch); exit; } // parse the html into a DOMDocument $dom = new DOMDocument(); @$dom->loadHTML($html); // grab all the on the page $xpath = new DOMXPath($dom); $hrefs = $xpath->evaluate("/html/body//a"); for ($i = 0; $i < $hrefs->length; $i++) { $href = $hrefs->item($i); $url = $href->getAttribute('href'); $sql="INSERT INTO links(cid, nlink)VALUES('$i','$url')"; $result=mysql_query($sql); echo $result; echo $url; Delete ... need to change host .. Hi I've got this database I created with fields ProductId ProductName Image I've managed to get it to list the ID,productname, and Image urls in a list. My next step is to have the image field actually display an image and make it clickable: heres what I've done so far: Code: [Select] <?php $con = mysql_connect("localhost","root",""); if (!$con) { die('Could not connect: ' . mysql_error()); } mysql_select_db("productfeed", $con); $result = mysql_query("SELECT * FROM productfeeds"); echo "<table border='0'> <tr> <th>Firstname</th> <th>Lastname</th> <th>Image</th> </tr>"; while($row = mysql_fetch_array($result)) { echo "<tr>"; echo "<td>" . $row['ProductID'] . "</td>"; echo "<td>" . $row['ProductName'] . "</td>"; echo "<td>" . $row['ImageURL'] . "</td>"; echo "</tr>"; } echo "</table>"; mysql_close($con); ?> Heres what I want to do: Code: [Select] while($row = mysql_fetch_array($result)) { echo "<tr>"; echo "<td>" . $row['ProductID'] . "</td>"; echo "<td>" . $row['ProductName'] . "</td>"; // my changes beneath echo "<td>" . <a href="<?php echo $row['ImageURL'];?>"> <img src="<?php echo $row['LinkURL']; ?>"> </a>. "</td>"; echo "</tr>"; } echo "</table>"; mysql_close($con); ?> Can you guys point me in the right direction? Many thanks I have a line like this it prints text link but I prefer image link how should i edit it I would appreciate some feedback Code: [Select] $templates['etiket'] = array('name' => t('ETİKET'), 'module' => 'uc_invoice_pdf', 'path' => $templates_uc_invoice_pdf_path, 'pdf_settings' => $pdf_settings); Hi, I've read a lot of places that it's not recommended to store binary files in my db. So instead I'm supposed to upload the image to a directory, and store the link to that directory in database. First, how would I make a form that uploads the picture to the directory (And what kinda directories are we talking?). Secondly, how would I retrieve that link? And I guess I should rename the picture.. I'd appreciate any help, or a good tutorial (Haven't found any myself). Is there any reason that people can think as to why DOMDocument::saveHTML would remove the following: Code: [Select] <![if !vml]> <img src="someimage.jpg" /> <![endif]> A little clarification... This HTML comment tag is used in my company's email newsletter code and is necessary to make Outlook 2007 behave properly. For whatever reason, saveHTML strips it out. I know that this doesn't conform to HTML standards and I'm guessing that that is why it is being stripped. BUT, from reading on the internet, saveHTML can produce junk html code anyways. Any help is appreciated. I have some code $doc = new DOMDocument(); $doc->loadHTML( '<html> <head><title>Test</title></head> <body></body></html>' ); $doc->encoding = 'iso-8859-1'; file_put_contents('test.html', $doc->saveHTML()); when i view the output file i get <html><head><title>Test</title></head><body></body></html> all on one line is there no way of having it format it like the original source code so that its not all bunched together? Hi all, I am pretty new to php and I am having an issue trying to load an XML document. When ever I try to use Xpath it negates all the code below the line, including the HTML, and returns a white page. here is my code: Code: [Select] <html> <head> <?php $xpath = new DOMXPath("structure.xml"); ?> <body> hello world </body> </html> I checked phpinfo() and I have both the DOM and XPath enables and installed. I have also tried using just DOM and that worked so it is only Xpath that is not working. Ideas? Thank you James S Hi guys, Just starting to play with PHP Domdocument, only to fail at the very first step: <?php $html = 'test/php/somefile.html' ; if(!empty($html)){ $dom_1 = new domDocument ; $dom_1->loadHTML($html) ; $links = $dom_1->getElementsByTagName('li') ; foreach ( $links as $link) { // echo $link ; echo $link->nodeValue, PHP_EOL; } } ?> When I visit it in a browser I get a WSOD, what am I missing? Code: [Select] $domdoc=new DOMDocument(); $domdoc->formatOutput=TRUE; $empty_cart_xml= '<Order> <Cart> <Items> <Item>1</Item> <Item>2</Item> <Item>3</Item> </Items> </Cart> </Order>'; $domdoc->loadXML($empty_cart_xml); print $domdoc->saveXML()."<hr/>"; //works up to this point $xpath=new DOMXPath($domdoc); $items=$xpath->query('Order/Cart/Items'); foreach($itemses AS $items) { $items->appendChild($domdoc->createElement('Item','4')); } print $domdoc->saveXML(); All I want to do is to add a new Item to Items. What am I doing wrong? I just finished (or so I thought) a project. But my client's server runs PHP4, so I need to adapt my code. Here's what stopped working: Code: [Select] $localClasses = new DOMDocument; $localClasses -> load("file.xml"); $localClasses -> get_elements_by_tagname('Title') -> item(0) -> firstChild -> nodeValue Here's my petty attempt at trying to adapt this code to run in PHP4: Code: [Select] $file = file_get_contents("localClasses.xml"); $localClasses = new DOMDocument($file); $test = $localClasses -> get_elements_by_tagname('Title'); $testText = $test -> item[0] -> firstChild -> nodeValue; print $testText; This doesn't give me any errors, but nothing shows up. Any help would be appreciated. Thanks for reading! Imagine an html with the following structure Code: [Select] <div class="item"> <div class="title"> <a class="title" href="http://www.domain.com/title.html">Title is here</a> </div> <div class="image"> <a href="http://www.domain.com/title.html"><img src=image.jpg /></a> </div> </div> How to make an array containing $title - $url - $image_url ? good evening dear PHPFreaks - hello to everybody. i want to create a link parser. i have choosen to do it with Curl. I have some lines together now. Love to hear your review... Since i am new to programming i love to get some hints from experienced devs. Here some details: well since we have several hundred of resultpages derived from this one: http://www.educa.ch/dyn/79362.asp?action=search Note: i want to itterate over the resultpages - with a loop. http://www.educa.ch/dyn/79376.asp?id=1568 http://www.educa.ch/dyn/79376.asp?id=2149 i take this loop: for($i=1;$i<=$match[1];$i++) { $url = "http://www.example.com/page?page={$i}"; // access new sub-page, extract necessary data } Dear PHP-Freaks, what do you think? What about the Loop over the target-Urls? BTW: you see - there will be some pages empty. Note - the empty pages should be thrown away. I do not want to store "empty" stuff. well this is what i want to. And now i need to have a good parser-script. Note: this is a tree-part-job: 1. fetching the sub-pages 2. parsing them 3. storing the data in a mysql-db Well - the problem - some of the above mentioned pages are empty. so i need to find a solution to leave them aside - unless i do not want to populate my mysql-db with too much infos.. Btw- parsing should be a part that can be done with DomDocument - What do you think? I need to combine the first part with tthe second - can you give me some starting points and hints to get this. The fetching-job should be done with CuRL - and to process the data into a DomDocument-Parser-Job. note ive taken the script from this place: http://www.merchantos.com/makebeta/php/scraping-links-with-php/ function storeLink($url,$gathered_from) { $query = "INSERT INTO links (url, gathered_from) VALUES ('$url', '$gathered_from')"; mysql_query($query) or die('Error, insert query failed'); } for($i=1;$i<= 10000; $i++) { $target_url = "http://www.educa.ch/dyn/79376.asp?id={$i}"; } // access new sub-page, extract necessary data $userAgent = 'Googlebot/2.1 (http://www.googlebot.com/bot.html)'; // make the cURL request to $target_url $ch = curl_init(); curl_setopt($ch, CURLOPT_USERAGENT, $userAgent); curl_setopt($ch, CURLOPT_URL,$target_url); curl_setopt($ch, CURLOPT_FAILONERROR, true); curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true); curl_setopt($ch, CURLOPT_AUTOREFERER, true); curl_setopt($ch, CURLOPT_RETURNTRANSFER,true); curl_setopt($ch, CURLOPT_TIMEOUT, 10); $html= curl_exec($ch); if (!$html) { echo "<br />cURL error number:" .curl_errno($ch); echo "<br />cURL error:" . curl_error($ch); exit; } // parse the html into a DOMDocument $dom = new DOMDocument(); @$dom->loadHTML($html); // grab all the on the page $xpath = new DOMXPath($dom); $hrefs = $xpath->evaluate("/html/body//a"); for ($i = 0; $i < $hrefs->length; $i++) { $href = $hrefs->item($i); $url = $href->getAttribute('href'); storeLink($url,$target_url); echo "<br />Link stored: $url"; } Dear PHP-Freaks, what do you think? What about the Loop over the target-Urls? love to hear from you ! Hi guys, Reading this from php.net, has got me a wee bit confused. Trying to implement is has got me doubly confused! My code: $dom = new DOMDocument; libxml_use_internal_errors(true); $dom->loadHTMLFile($parent_node); if($dom->childNodes <>0) { $kids = array ( 'url' => $parent_node, 'No_of_kids' => count($dom->childNodes) ); } Results in '' Notice: Object of class DOMNodeList could not be converted to int' How the heck am i supposed to count the childNodes? so there's some html i'm having to fetch and parse for personal use...but some of the data i want is started in a table written this way: rawTableData = {"rows": [{"colname1": value, "colname2": "value", "colname3: value} how can i use domdocument to parse this data if say i want the value for colname2? there are no tags for me to use. Hi, I have a php code that could extract the categories and display them. However, I still can't extract the numbers that goes along with it too(without the bracket).
This is my code:
<?php $grep = new DoMDocument(); @$grep->loadHTMLFile("http://www.lelong.com.my/Auc/List/BrowseAll.asp"); $finder = new DomXPath($grep); $class = "CatLevel1"; $nodes = $finder->query("//*[contains(@class, '$class')]"); foreach ($nodes as $node) { $span = $node->childNodes; echo $span->item(0)->nodeValue."<br>"; } ?>This is my desired output: Arts, Antiques & Collectibles : 9768 B2B & Industrial Products : 2342 Baby : 3453 etc...Any help is appreciated. Thanks! good day dear PHPFreaks - hello to everybody. i want to create a link parser. i have choosen to do it with Curl. I have some lines together now. Love to hear your review... Since i am new to programming i love to get some hints from experienced devs. Here some details: well since we have several hundred of resultpages derived from this one: http://www.educa.ch/dyn/79362.asp?action=search Note: i want to itterate over the resultpages - with a loop. http://www.educa.ch/dyn/79376.asp?id=1568 http://www.educa.ch/dyn/79376.asp?id=2149 i take this loop: for($i=1;$i<=$match[1];$i++) { $url = "http://www.example.com/page?page={$i}"; // access new sub-page, extract necessary data } what do you think? What about the Loop over the target-Urls? BTW: you see - there will be some pages empty. Note - the empty pages should be thrown away. I do not want to store "empty" stuff. well this is what i want to. And now i need to have a good parser-script. Note: this is a tree-part-job: 1. fetching the sub-pages 2. parsing them 3. storing the data in a mysql-db Well - the problem - some of the above mentioned pages are empty. so i need to find a solution to leave them aside - unless i do not want to populate my mysql-db with too much infos.. Btw- parsing should be a part that can be done with DomDocument - What do you think? I need to combine the first part with tthe second - can you give me some starting points and hints to get this. The fetching-job should be done with CuRL - and to process the data into a DomDocument-Parser-Job. No Problem he But how to do the DOM-Document-Job ... i have installed FireBug into the FireFox... now i have the Xpaths for the sites: http://www.educa.ch/dyn/79376.asp?id=1187 http://www.educa.ch/dyn/79376.asp?id=2939 http://www.educa.ch/dyn/79376.asp?id=1515 http://www.educa.ch/dyn/79376.asp?id=1469 Altes Schulhaus Ossingen :: /html/body/div[2] Guntibachstrasse 10 :: /html/body/div[4] 8475 Ossingen :: /html/body/div[6] sekretariat.psossingen@bluewin.ch :: /html/body/div[9]/a Tel:052 317 15 45 :: /html/body/div[11] Fax:052 317 04 42 :: /html/body/div[12] but how to appyl in the Simple DomDocument - i want to use this he http://simplehtmldom.sourceforge.net/ look forward to a hint that gives me a starting point Hi the what i'm trying to do is i want to get the contents of heading and text_details and merge them together before adding them as one entry into database: Code: [Select] <div class="heading"><p> Second quote: </div><div class=text_details> <p> Satan's substitute for repentance is the man's rationalization of evil. <p></b></div> my code looks s/thing like this: Code: [Select] foreach( $dom->getElementsByTagName('div') as $div ) { foreach( $div->attributes as $attributes ) { if( strtolower($attributes->name) == 'class' ) { if( strtolower($attributes->value) == 'heading' || strtolower($attributes->value) == 'text_details') { $quote = $div->textContent; $clean_quote = mysql_real_escape_string($quote); echo "the quote is: " . $clean_quote . "<br />"; mysql_query("INSERT INTO quotes (quote) VALUES ('$clean_quote')")or die(mysql_error()); } } } } when i do this obviously i get: Second quote of the day entered into a field by itself and Satan's substitute for repentance is the man's rationalization of evil. into another field...how to make them one!?? thanks in advance hello dear friends - i want to test if the DOMdocument [class] exists? Can i do this in the shell (on OpenSuse 11.3)? bool class_exists ( string $class_name [, bool $autoload = true ] ) bool class_exists ( string $DOMdocument [, bool $autoload = true ] ) or do i have to create a file that i call itself in the shell!? look forward for an idea / hint / tipp regards db1 Hi, I have a RSS feed cached as an XML file. I need to pull some info out of it so I can then print it to the page. Currently I am using this code to extract the data: $doc = new DOMDocument(); $doc->load('inthenews.xml'); $inthenews = array(); foreach ($doc->getElementsByTagName('item') as $node) { $itemRSS = array ( 'title' => $node->getElementsByTagName('title')->item(0)->nodeValue, 'link' => $node->getElementsByTagName('link')->item(0)->nodeValue, 'desc' => $node->getElementsByTagName('description')->item(0)->nodeValue, ); array_push($inthenews, $itemRSS); } However, the Description node contains more than I want. I need to remove everything except for the image ( <img../> ) it contains. Is there some way of running preg or similar on the nodeValue as it is extracted? Or an alternative to "getElementsByTagName" that allows searching for strings? If not, does anyone have a suggestion for doing this? I tried running preg_replace on the array, but it doesn't seem to do anything?? An example of the Array created by my code above is shown below: Code: [Select] [0] => Array ( [title] => BBC radio Cambridge 7.20am [link] => images/news/Matthew_Freeman_Radio_Cambridgeshire_21-10-10.mp3 [desc] => <img alt="BBC-logo" src="http://www2.mrc-lmb.cam.ac.uk/images/news/BBC-logo.jpg" height="54" width="127" /><br/>BBC radio Cambridge 7.20am 21.10.10: Dr Matthew Freeman"<br/> 21 October 2010 ) Thanks in advance for any advice Phil |