PHP - Create Html Parser Loop Through
how should i approach the following:
a page with a products list+link to product page i want to build a crawler that loops through all the products in the list and goes to the product page and and parses the product page. need help with the loop Similar TutorialsI have an XML file (differences.xml) <element> <item code="lM" name="castle" type="building"> <cost>5000</cost> </item> ...more items.... </element> I have parser.php whih parses the XML and displays all item names in a checklist for the user to select items. These selected items are method=post back to parser.php which re-parses the XML where selected item = item name and SHOULD then display all selected items' names (and more): <?php if (isset($_GET['formSubmit'])) { $option = $_POST['option']; $option = array_values($option); if (!empty($option)){ $xml = simplexml_load_file('differences.xml'); $i = 0; $count = count($option); while($i < $count) { $selected = $xml->xpath("//item[@name='".$option[$i]."']"); echo $selected[$i]['name']; $i++; } }else{ echo "No items selected, Please select:"; } }else{ $xml = simplexml_load_file('differences.xml'); $object = $xml->xpath('//item'); $count = count($object); $i = 0; echo "<form method='POST' action='parser.php'>"; while($i < $count){ $xmlname = $object[$i]['name']; echo "<input type='checkbox' name='option[".$i."]' value='$xmlname'>".$xmlname; echo "<br>"; $i++; } echo "<br><input type='submit' name='formSubmit' value='Proceed'><br>"; echo "</form>"; } ?> The problem is, the parser is only displaying the first selected item and isn't outputting the rest. To clarify the code: - The first half executes only if items have been previously selected in the form. If no items were selected, it goes to the secod half, which parses the XML file and presents the checklist form for the user to select items. THIS WORKS GREAT. - After pressing the submit button, the items are sent using POST and the page is reloaded. THIS WORKS GREAT. - After reloading, the first half executes and finds the $_POST and resets the array so that it works with the coming loop. THIS WORKS GREAT. - The loop parses the XML and returns only those item names that have been selected in the $_POST. THIS DOES NOT WORK. So my question is: what's wrong with my code? The array $options has all the proper information, but the parser is only outputting the first item. All consecutive loops result in empty space with no output. I've been at this for 2 weeks and can't find the solution. Any help is appreciated! require_once 'phpSimpleHtmlDomClass.php'; $html = '<div> <div class="man">Name: madac</div> <div class="man">Age: 18 <div class="man">Class: 12</div> </div>' $name=$html->find('div[class="man"]', 0)->innertext; $age=$html->find('div[class="man"]', 1)->innertext; $cls=$html->find('div[class="man"]', 2)->innertext; wanna get a text from each div class="man" but it didn't work because there is a missing closing div tag on 2nd line of html code. please help me to fix this. thanks in advance. Im using some software called php html dom parser i wont to be able to keep the souce tidy i.e before dom parser <?php //////////////////////SEO TOOL/////////////////////////// $title = 'Green Deal Nationwide - PB Energy Solutions Ltd'; $description = 'Delivering all your environmental needs to \'green\' up your business, improve reputation, increase profitability and give a competitive advantage.'; /////////////////////////////////////////////////////// ?> <?php include('includes/settings.php'); ?> <?php include('includes/header.php'); ?> <div class="container"> <div id="large-page-img"> <img src="<?php echo URL(); ?>images/home-page-slide.jpg" width="911" height="230" /> <img src="<?php echo URL(); ?>images/home-page-slide-1.jpg" width="911" height="230" /> <img src="<?php echo URL(); ?>images/home-page-slide-2.jpg" width="911" height="230" /> </div> <div id="content-home"> <div class="iedit"> after dom parser saved to file <?php //////////////////////SEO TOOL/////////////////////////// $title = 'Green Deal Nationwide - PB Energy Solutions Ltd'; $description = 'Delivering all your environmental needs to \'green\' up your business, improve reputation, increase profitability and give a competitive advantage.'; /////////////////////////////////////////////////////// ?> <?php include('includes/settings.php'); ?> <?php include('includes/header.php'); ?> <div class="container"> <div id="large-page-img"> <img src="<?php echo URL(); ?>images/home-page-slide.jpg" width="911" height="230" /> <img src="<?php echo URL(); ?>images/home-page-slide-1.jpg" width="911" height="230" /> <img src="<?php echo URL(); ?>images/home-page-slide-2.jpg" width="911" height="230" /> </div> <div id="content-home"> <div class="iedit"><div class="iedit"> is there anyway i can keep it like the original fil after dom? im using simple_html_dom.php i want to extract the following html: and number the array key so i will know the location of each <td> and extract the value the this cell: <TD ALIGN=RIGHT NOWRAP class="ftableline1"> 3.7200 </TD> with this : Code: [Select] foreach($html->find('td[class=ftableline1]') as $e) echo $e->innertext . '<br>'; Code: [Select] <TR class="ftableline1"> <TD ALIGN=RIGHT NOWRAP class="ftableline1"> 3.7200 </TD> <TD ALIGN=RIGHT NOWRAP class="ftableline1"> 3.5400 </TD> <TD ALIGN=RIGHT NOWRAP class="ftableline1"> 3.6651 </TD> <TD ALIGN=RIGHT NOWRAP class="ftableline1"> 3.5982 </TD> <TD align="right" NOWRAP class="ftableline1"> <A HREF=_matbea=1><IMG SRC="images/tezuga_graphit.gif" WIDTH=15 HEIGHT=15 ALT="Show Graph" BORDER="0"></a><BR> </TD> <TD ALIGN=RIGHT NOWRAP>0.01%</TD> <TD ALIGN=right dir="rtl"> <IMG SRC="images/arrow_up.gif" WIDTH=10 HEIGHT=8 BORDER=0><BR> </TD> <TD align="right" NOWRAP dir="rtl" class="ftableline1"> 3.6316 </TD> <TD align="right" NOWRAP dir="rtl" class="ftableline1"> 1 </TD> <TD ALIGN=RIGHT NOWRAP dir="rtl" class="ftableline1"> <A HREF=_matbea=1> דולר ארה"ב</A><BR> </TD> <TD align="right" NOWRAP dir="rtl" class="ftableline1"> <A href="_matbea=1"><IMG SRC="../../meida/images/f1.gif" HEIGHT=15 WIDTH=21 border=0></A><BR> </TD> <TD ALIGN=center NOWRAP dir="rtl"><INPUT TYPE="Checkbox" VALUE="1" NAME="check" id="check" ></TD> </TR> Hello dear Community, i have a document i need to parse it and spit out only this part of the table: see http://schulnetz.nibis.de/db/schulen/schule.php?schulnr=67003&lschb= how to i parse the stuff!? With perl or php? Note i have the xpaths (see below) Sad that i cannot apply them on Simple DOM Parser since this Dom Parser does not work with Xpaths but with CSS-Selectors: Well i want to get all the data with that are within the table that name is called class="fliess" How to dump all the results? BTW - thinking about the most elegant way, i think it is the most pretty way would be to do it with perl - So i can try it with HTML::TableExtract or.... Well what do you suggest - Which way to choose to do this [very] simple thing? Look forward to hear from you! see the xpaths: Schule: /html/body/center/table/tbody/tr[2]/td[1] Stasse: /html/body/center/table/tbody/tr[3]/td[1] Ort: /html/body/center/table/tbody/tr[4]/td[1] Tel: /html/body/center/table/tbody/tr[5]/td[1] Schulgliederungen: /html/body/center/table/tbody/tr[6]/td[1] Besonderheite: /html/body/center/table/tbody/tr[7]/td[1] E-Mail: /html/body/center/table/tbody/tr[8]/td[1] Schulnummer: /html/body/center/table/tbody/tr[9]/td[1] hello dear community, i am currently wroking on a approach to parse some sites that contain datas on Foundations in Switzerland with some details like goals, contact-E-Mail and the like,,, See http://www.foundationfinder.ch/ which has a dataset of 790 foundations. All the data are free to use - with no limitations copyrights on it. I have tried it with PHP Simple HTML DOM Parser - but , i have seen that it is difficult to get all necessary data -that is needed to get it up and running. Who is wanting to jump in and help in creating this scraper/parser. I love to hear from you. Please help me - to get up to speed with this approach? regards Dilbertone Hi All, I am using the PHP Simple HTML DOM parser to connect to a financials website, parse out a companies financial information (Income statement in this case) and then insert the scrapped data into a mysql database that I can then later use to run automated calculations. Here is the code I have so far: Code: [Select] <?php include_once 'simple_html_dom.php'; //Connect to financial Website and Create DOM from URL $income_statement = file_get_html('http://www.WEBSITE.com/finance?etc..etc...etc...etc...'); //PULL FINANCIAL DATA foreach($income_statement->find('td[class]' ) as $lines=>$data) { echo $data->plaintext . "<br/>"; } // clean up memory $html->clear(); unset($html); ?> So far I am able to get output that looks like this: Code: [Select] Revenue 336.57 331.52 324.32 319.29 320.40 Other Revenue, Total - - - - - Total Revenue 336.57 331.52 324.32 319.29 320.40 etc............................. But being a newb I do not understand how I can break each $ value and each - into their own variables and then insert them to their corresponding mysql table fields. During the database insert I would like to ignore field headings from insertion (i.e Revenue, Total Revenue, etc.... Any help would be absolutely amazing, as I have been reading, scripting and searching for information like crazy, but just can't seem to figure it out. hello dear Freaks
i am currently musing bout the portover of a python bs4 parser to php - working with the simplehtmldom-parser / pr the DOM-selectors... (see below). The project: for a list of meta-data of wordpress-plugins: - approx 50 plugins are of interest! but the challenge is: i want to fetch meta-data of all the existing plugins. What i subsequently want to filter out after the fetch is - those plugins that have the newest timestamp - that are updated (most) recently. It is all aobut acutality... https://wordpress.org/plugins/participants-database ....and so on and so forth.
https://wordpress.org/plugins/wp-job-manager we have the following set of meta-data for each wordpress-plugin: Version: 1.9.5.12 installations: 10,000+ WordPress Version: 5.0 or higher Tested up to: 5.4 PHP Version: 5.6 or higher Tags 3 Tags:databasemembersign-up formvolunteer Last updated: 19 hours ago
the project consits of two parts: the looping-part: (which seems to be pretty straightforward). the parser-part: where i have some issues - see below. I'm trying to loop through an array of URLs and scrape the data below from a list of wordpress-plugins. See my loop below- as a base i think it is good starting point to work from the following target-url:
plugins wordpress.org/plugins/browse/popular with 99 pages of content: cf ...
the Output of text_nodes: ['Version: 1.9.5.12', 'Active installations: 10,000+', 'Tested up to: 5.6 '] but if we want to fetch the data of all the wordpress-plugins and subesquently sort them to show the -let us say - latest 50 updated plugins. This would be a interesting task:
first of all we need to fetch the urls then we fetch the information and have to sort out the newest- the newest timestamp. Ie the plugin that updated most recently List the 50 newest items - that are the 50 plugins that are updated recently ..
we have the following set see here the Soup_ soup = BeautifulSoup(r.content, 'html.parser') target = [item.get_text(strip=True, separator=" ") for item in soup.find( "h3", class_="screen-reader-text").find_next("ul").findAll("li")[:8]] head = [soup.find("h1", class_="plugin-title").text] new = [x for x in target if x.startswith( ("V", "Las", "Ac", "W", "T", "P"))] return head + new with ThreadPoolExecutor(max_workers=50) as executor1: futures1 = [executor1.submit(parser, url) for url in allin] for future in futures1: print(future.result())
see the formal output Quote
background: https://stackoverflow.com/questions/61106309/fetching-multiple-urls-with-beautifulsoup-gathering-meta-data-in-wp-plugins Well - i guess that we c an do this with the simple DOM Parser - here the seclector reference. https://stackoverflow.com/questions/1390568/how-can-i-match-on-an-attribute-that-contains-a-certain-string
look forward to any hint and help.
have a great day Edited May 3, 2020 by dil_bertHello dear friends, first of all : merry merry Xmas!!! i want to parse with the simple Simple HTML DOM Parser, well i am pretty new to php and to the Simple HTML DOM Parser. My example: http://schulen.bildung-rp.de/gehezu/startseite/einzelanzeige.html?tx_wfqbe_pi1[uid]=60119 I want to collect the data in the block: I have investigated the sourcecode - and found out that the attribute of interest should be this one: class="content"div class="content"><!-- TYPO3SEARCH_begin --> here the code is: - my trails. // inculde the Simple HTML DOM Parser include_once('simple_html_dom.php'); // get the file we want to parse right now,create a DOM $html = file_get_html(''); // simple_html_dom::find() creates a new // simple_html_dom-Objekt, that consists out of // corresponding childelements foreach($html->find('class: content ') as $h3) { // simple_html_dom::get the text in a tag // den Text innerhalb eines Tags if($h3->innertext == 'Text of a H3 Tag') { break; } } // simple_html_dom::next_sibling() gives the // next Element $table = $h3->next_sibling(); but believe me - it gives me not back what is aimed. what have id done wrong...? dbone I'm using PHP 5.2 Server and Simple HTML DOM 1.5. This script scrape or extract data from a football site, its fully working on PHP 5.9 Server but I need to know how I can fix it for PHP 5.2 server. Can someone give me a hint on how can I fix the error? Thanks in advance. My PHP 5.2 Server script output shows: ++++++++++++++++ Object id #599 Object id #604 Object id #609 Object id #614 Object id #619 Object id #627 Object id #632 Object id #637 Object id #642 Object id #647 Object id #655 Object id #660 Object id #665 Object id #670 Object id #675 Object id #683 Object id #688 Object id #693 Object id #698 Object id #703 Object id #711 Object id #716 Object id #721 Object id #726 Object id #731 ++++++++++++++++ while PHP 5.9 Server says ++++++++++++++++ Rk Player Team POS OPPONENT 1 Aaron Rodgers GB QB at CAR 2 Tom Brady NE QB vs. SD 3 Matt Schaub HOU QB at MIA 4 Michael Vick PHI QB at ATL ++++++++++++++++ I did applied the bug solution listed on https://sourceforge.net/tracker/index.php?func=detail&aid=3107230&group_id=218559&atid=1044037 but it is still not working. It says: ++++++++++++++++ Details: I get compiler errors in PHP 5.2 when using this as an object. The offending lines are 609 and 940, which both contain this construct: if ($this->size>0) $this->char = $this->doc[0]; This tries to get the first character of $this->doc, but PHP 5.2 sees it as trying to access it as an array. It's easily fixed by this: if ($this->size>0) $this->char = substr($this->doc, 0, 1); Or you could probably use chr(ord($this->doc)) as well. Either way solves the compile error without changing functionality. ++++++++++++++++ Here are my codes: Code: [Select] <?php # don't forget the library include('simple_html_dom.php'); # this is the global array we fill with article information $articles = array(); $source = 'http://www.athlonsports.com/columns/winning-game-plan/fantasy-football-qb-rankings'; # passing in the first page to parse, it will crawl to the end # on its own getArticles($source); function getArticles($page) { global $articles, $descriptions; $html = new simple_html_dom(); $html->load_file($page); //$items = $html->find('div[class=preview]'); $items = $html->find('tbody tr'); foreach($items as $post) { # remember comments count as nodes /*$articles[] = array($post->children(3)->outertext, $post->children(6)->first_child()->outertext);*/ $articles[] = array($post->children(0), $post->children(1), $post->children(2), $post->children(3), $post->children(4)); } # lets see if there's a next page if($next = $html->find('a[class=nextpostslink]', 0)) { $URL = $next->href; echo "going on to $URL <<<\n"; # memory leak clean up $html->clear(); unset($html); getArticles($URL); } } ?> <html> <head> </head> <body> <? echo "Source: " . $source; ?> <table cellpadding="5" cellspacing="0" border="0"> <?php foreach($articles as $item) { echo "<tr>"; echo "<td>" . $item[0] . "</td><td>" . $item[1] . "</td><td>" . $item[2] . "</td>"; echo "<td>" . $item[3] . "</td><td>" . $item[4] . "</td>"; echo "<tr>"; } ?> </table> </body> </html> good day dear community, this is a big issue. I have to decide: between native PHP DOM Extension or of simple DOM html parser well i want to parse the site he http://buergerstiftungen.de/cps/rde/xchg/SID-A7DCD0D1-702CE0FA/buergerstiftungen/hs.xsl/db.htm http://buergerstiftungen.de/cps/rde/xchg/SID-A7DCD0D1-702CE0FA/buergerstiftungen/hs.xsl/db.htm I will suggest to use the native PHP "DOM" Extension instead of "simple html parser", since it will be much faster and easier What do you think about this one here...: Code: [Select] $doc = new DOMDocument @$doc->loadHTMLFile('...URL....'); // Using the @ operator to hide parse errors $contents = $doc->getElementById('content')->nodeValue; // Text contents of #content look forward to hear from you best regards db1 HI all, I need help to get data out of an array using a while loop or something simalar. What I have, is this: <?php $data = array( 1 => array( "name" => "Jordan Smith", "age" => "16 Years 0 Months" ), 2 => array( "name" => "Olivia Smith", "age" => "13 Years 12 Months" ) ); Now, what I did, is I constructed this: while($result=$data[1]) { echo $result['name']; } But it echoed "Jordan Smith" for infinity down the page. If anyone can help be do this, it will be great. Thanks First of all i create the page to copy the buttons as pixels to paste into a design, it's not for a working webpage. It goes wrong here, <option value="0">Day</option> <?= for(var i=0; i<31; i++){?> <option value="<?=i?>"><?=i?></option> <?= } ?> </select> Code: [Select] <?php ?> <html> <head> <title>Community Development Project</title> </head> <body> <div id="container"> <form id="opties1" name="opties1" method="post" action="something.php"> <div class="forminput"> <input type="button" id="signin" name="signin" value="Sign In" /> <br/> <br/> <input type="button" id="register" name="register" value="Register" /> <br/> <br/> <input type="button" id="uploadimage" name="uploadimage" value="Upload Image" /> <br/> <br/> <input type="button" id="next" name="next" value="Next" /> <br/> <br/> <input type="button" id="addanother" name="addanother" value="Add Another" /> <br/> <br/> <input type="button" id="register" name="register" value="Register" /> <br/> <br/> <input type="button" id="create" name="create" value="Create" /> <br/> <br/> <input type="button" id="addfolder" name="addfolder" value="Add Folder" /> <br/> <br/> <input type="button" id="addimage" name="addimage" value="Add Image" /> <br/> <br/> <input type="button" id="addtask" name="addtask" value="Add Task" /> <br/> <br/> <input type="button" id="cancel" name="cancel" value="Cancel" /> <br/> <br/> <input type="button" id="watch" name="watch" value="Watch" /> <br/> <br/> </div> </form> <form id="opties2" name="opties2" method="post" action="something.php"> <div class="forminput"> <select id="day" name="day"> <option value="0">Day</option> <?= for(var i=0; i<31; i++){?> <option value="<?=i?>"><?=i?></option> <?= } ?> </select> <select id="country" name="Country"> <option value="0">Netherlands</option> <option value="1">Germany</option> <option value="2">Belgium</option> </select> </div> </form> </div> </body> </html> I am trying to write a shopping cart script, that once they have confirmed their order it will insert into the database their order information. Basically as a product can have a number of different quanitities that they wish to purchase, I want to create an associative array that has a product, linked to the quantity. I want to do this so that the order information stays in one row in the database, so in the 'product' row, it has values like (cow -> 2) cow being the product and 2 being the quantity. Here is the code I am developing, and I have commented in the array which I am trying to create just the show my idea of thinking. any help, will be greatly appreciated! function updateCart($product_id,$id){ if($_SESSION['cart']) { foreach($_SESSION['cart'] as $product_id => $quantity) { $total = 0; $sql = sprintf("SELECT title, price, shipping FROM product WHERE pkID = %d;",$product_id); $result = query($sql); if(numRows($result) > 0 ){ $row = getResult($result); $price = $row['price']; $shipping = $row['ship']; $line_cost = ($price + $shipping) * $quantity; $total = $total + $line_cost; $product[] = $row['title'] -> $quantity; //unsure of how to code $sql2 = "INSERT INTO cart (userID, products, total, date) VALUES ('$id', '$product', '$total' '$date')"; $query = query($sql2); } } }else{ $alert = 'No items in shopping cart.'; } return $total; } Hi, This problem has been driving me crazy all day. I am relatively new to PHP. I basically am trying to populate a database with data from site users. I am using session variables to store their data temporarily as they navigate through the sign up process. A user will input how many 'categories' they wish to populate on page 1. Page 2 will then ask them to specify the details of each category. Eg Category 1: Title, Description, Amount. Category 2: Title, etc. So far I have been able to do this, I now want to store what they have input in session variables. My thoughts were to take the number of categories they have sepcified and create that number of arrays using a loop. Each array will store the details on each category. My code is as follows: Code: [Select] $count=$_POST['count']; //Get how many categories were added //Create an array for each category for ( $counter = 0; $counter <= $count; $counter++){ ${'Categoryarr'.$counter} = array(); }; for ( $counter = 0; $counter <= $count; $counter++){ $Categoryarr[$counter][1]=$_POST['amount_'.$count]; $Categoryarr[$counter][2]=$_POST['desc_'.$count]; $Categoryarr[$counter][3]=$_POST['title_'.$count]; }; When I output the code, I seems to have created the specified number of arrays, but has populated all of them with the same data from the last category. Does anyone know where I am going wrong? Thanks, Bernard Hi there, I hope somebody can help me creating a loop for the following code: What it currently does is, showing only 1 image, while in fact there are 3 of them. I've combined a loginsystem with slideviewer. When i use the original php code (or a part of it) it only shows one listing. I need it to list all of them. <div id="mygalone" class="svw"> <ul> <li> <?php $result = mysql_query("SELECT reference FROM user_photos WHERE`profile_id`='".$row['id']."'"); while ($row2 = mysql_fetch_array($result, MYSQL_ASSOC)) { echo "<a href=\"".$_GET['username']."/pics/".$row2['reference']."\"> <img src=\"".$_GET['username']."/pics/thumbs/".$row2['reference']."\"></a><br/><br/>"; } } ?> </li> </div> What can be stripped down to this: <?php { $result = mysql_query("SELECT reference FROM user_photos WHERE`profile_id`='".$row['id']."'"); while ($row2 = mysql_fetch_array($result, MYSQL_ASSOC)) { echo "<img src=\"".$_GET['username']."/pics/".$row2['reference']."\">"; } } ?> You would asume that this is allready a loop, but it does not act like it. Best regards, Martijn (the netherlands) Hi everyone, I'm trying to select either a class or an id using PHP Simple HTML DOM Parser with absolutely no luck. My example is very simple and seems to comply to the examples given in the manual(http://simplehtmldom.sourceforge.net/manual.htm) but it just wont work, it's driving me up the wall. Here is my example: http://schulnetz.nibis.de/db/schulen/schule.php?schulnr=94468&lschb= I think the HTML is invalid: i cannot parse it. Well i need more examples - probly i have overseen something! If anybody has a working example of Simple-html-dom-parser...i would be happy. The examples on the developersite are not very helpful. your dilbertone |