PHP - File_get_contents Or Curl - Which One To Take For A Little Parser
Similar TutorialsA German DB that collects all the data from all German Foundations... see: http://www.suche.stiftungen.org/index.php?strg=87_124&baseID=129 Here we find all Foundations in Germany: : 8074 different foundations You get the full results if you choose % as wildcard in the Search-field. But if we do that - then you get some kind of overflow... 350 results are the limit. More is not possible to show. So the question is: How can we create a spider that runs across the site and asks step by step - that we get all : 8074 results. The way to get through this database is to search combinations of letters eg "ac" and select search only titles. Then go through every pair of letters. If you still get too many results for a particular pair, use 3 letters. aca, acb,... Can i do this with File_get_contents_ or with Curl!? (eg -MultiCurl ) Well - i want to make a little script thatdoes this - i need to create a little automation - that does this task automatically. Regarding the destination database, it's all going into sqlite, which we believe can handle large enough sets of data without any problems. We can download the database as a file too. For capabilities, see he http://www.sqlite.org/limits.html The question is _ how to create the first approach of the parser...!`? Can any body assist! All, I'm trying to use some code to get some file contents from another website and am looking at performance. What is the quicker way and better for my resources to get out to websites, is it cURL or by using file_get_contents? Thanks in advance. Hi everyone
I've look all other the web for a work around that.
basically my problem is:
if i use file_get_contents or curl to get www.google.fr, there is NO problem. i get the data i want
but when i try to use it on another target it doesn't work. it loads and loads for couple seconds and finally says that it could not retrieve informations.
the target is not blocked as i can personnally use it on my webhost, but some people on other webhosts can't.
A friend of mine told me that it was because some webhosts blocks unsecure targets.
Is there any way around it?
$context = stream_context_create(array( 'http' => array( 'header' => "Authorization: Basic " . base64_encode("admin:$password") ) )); $data = file_get_contents("http://TARGETHOST:29015/status.json", false, $context); $status = json_decode($data); var_dump($status);(And again this works on my webhost, i can query like that any servers i want, but others can't use this they get a timeout trying it (when doing a file_get_contents on google works)) Hello dear folks i am currently workin on a little parser with cURL - and i have some questions he How to do the String concatenation with PHP ; note i wanna do it with curl! the target URL: http://www.schulministerium.nrw.de/BP/SchuleSuchen?action=572.8745475728288 click all checkbuttons Results: approx 6400 results Here i can provide some "more help for getting the target!" - btw see three detail page: http://www.schulministerium.nrw.de/BP/SchuleSuchen?action=261.2855969084779&SchulAdresseMapDO=116191 http://www.schulministerium.nrw.de/BP/SchuleSuchen?action=261.2855969084779&SchulAdresseMapDO=116270 http://www.schulministerium.nrw.de/BP/SchuleSuchen?action=261.2855969084779&SchulAdresseMapDO=188268 btw: we can loop over the results - with a iteration - especially i wanna know: how to do the String concatenation with PHP ; note i wanna do it with curl! function get_page_data($url) { $ch = curl_init(); curl_setopt($ch, CURLOPT_URL, $url); curl_setopt($ch, CURLOPT_HEADER, 0); curl_setopt($ch, CURLOPT_RETURNTRANSFER, true); $output = curl_exec($ch); if($output!=false && $_POST['dt']=='No') return $output; curl_close($ch); } for($i=1;$i<=$match[1];$i++) { $url = "http://www.example.com/page?page={$i}"; $data = get_page_data($url); if($data) { $cleaned = string_between('onload="check();">', '</body>', $data); return = stip_tags($cleaned, '<table><tr><td><div>'); } } any and all help will be greatly appreciated love to hear from you best regards I would like to spoof the useragent as googlebot through the implementation of file_get_contents, or curl (which ever is easiest). How to do this? Googlebot useragent = Googlebot/2.1 (+http://www.google.com/bot.html) hello dear Freaks
i am currently musing bout the portover of a python bs4 parser to php - working with the simplehtmldom-parser / pr the DOM-selectors... (see below). The project: for a list of meta-data of wordpress-plugins: - approx 50 plugins are of interest! but the challenge is: i want to fetch meta-data of all the existing plugins. What i subsequently want to filter out after the fetch is - those plugins that have the newest timestamp - that are updated (most) recently. It is all aobut acutality... https://wordpress.org/plugins/participants-database ....and so on and so forth.
https://wordpress.org/plugins/wp-job-manager we have the following set of meta-data for each wordpress-plugin: Version: 1.9.5.12 installations: 10,000+ WordPress Version: 5.0 or higher Tested up to: 5.4 PHP Version: 5.6 or higher Tags 3 Tags:databasemembersign-up formvolunteer Last updated: 19 hours ago
the project consits of two parts: the looping-part: (which seems to be pretty straightforward). the parser-part: where i have some issues - see below. I'm trying to loop through an array of URLs and scrape the data below from a list of wordpress-plugins. See my loop below- as a base i think it is good starting point to work from the following target-url:
plugins wordpress.org/plugins/browse/popular with 99 pages of content: cf ...
the Output of text_nodes: ['Version: 1.9.5.12', 'Active installations: 10,000+', 'Tested up to: 5.6 '] but if we want to fetch the data of all the wordpress-plugins and subesquently sort them to show the -let us say - latest 50 updated plugins. This would be a interesting task:
first of all we need to fetch the urls then we fetch the information and have to sort out the newest- the newest timestamp. Ie the plugin that updated most recently List the 50 newest items - that are the 50 plugins that are updated recently ..
we have the following set see here the Soup_ soup = BeautifulSoup(r.content, 'html.parser') target = [item.get_text(strip=True, separator=" ") for item in soup.find( "h3", class_="screen-reader-text").find_next("ul").findAll("li")[:8]] head = [soup.find("h1", class_="plugin-title").text] new = [x for x in target if x.startswith( ("V", "Las", "Ac", "W", "T", "P"))] return head + new with ThreadPoolExecutor(max_workers=50) as executor1: futures1 = [executor1.submit(parser, url) for url in allin] for future in futures1: print(future.result())
see the formal output Quote
background: https://stackoverflow.com/questions/61106309/fetching-multiple-urls-with-beautifulsoup-gathering-meta-data-in-wp-plugins Well - i guess that we c an do this with the simple DOM Parser - here the seclector reference. https://stackoverflow.com/questions/1390568/how-can-i-match-on-an-attribute-that-contains-a-certain-string
look forward to any hint and help.
have a great day Edited May 3, 2020 by dil_bertgood day dear community, i am workin on a Curl loop to fetch multiple pages: i have some examples - and a question: Example: If we want to get information from 3 sites with CURL we can do it like so: $list[1] = "http://www.example1.com"; $list[2] = "ftp://example.com"; $list[3] = "http://www.example2.com"; After creating the list of links we should initialize the cURL multi handle and adding the cURL handles. $curlHandle = curl_multi_init(); for ($i = 1;$i <= 3; $i++) $curl[$i] = addHandle($curlHandle,$list[$i]); Now we should execute the cURL multi handle retrive the content from the sub handles that we added to the cURL multi handle. ExecHandle($curlHandle); for ($i = 1;$i <= 3; $i++) { $text[$i] = curl_multi_getcontent ($curl[$i]); echo $text[$i]; } In the end we should release the handles from the cURL multi handle by calling curl_multi_remove_handle and close the cURL multi handle! If we want to another Fetch of sites with cURL-Multi - since this is the most pretty way to do it! Well I am not sure bout the string concatenation. How to do it - Note I want to fetch several hundred pages: see the some details for this target-server sites - /(I have to create a loop over several hundred sites). * siteone.example/?show_subsite=9009 * siteone.example/?show_subsite=9742 * siteone.example/?show_subsite=9871 .... and so on and so forth Question: How to appy this loop into the array of the curl-multi? <?php /************************************\ * Multi interface in PHP with curl * * Requires PHP 5.0, Apache 2.0 and * * Curl * ************************************* * Writen By Cyborg 19671897 * * Bugfixed by Jeremy Ellman * \***********************************/ $urls = array( "siteone", "sitetwo", "sitethree" ); $mh = curl_multi_init(); foreach ($urls as $i => $url) { $conn[$i]=curl_init($url); curl_setopt($conn[$i],CURLOPT_RETURNTRANSFER,1);//return data as string curl_setopt($conn[$i],CURLOPT_FOLLOWLOCATION,1);//follow redirects curl_setopt($conn[$i],CURLOPT_MAXREDIRS,2);//maximum redirects curl_setopt($conn[$i],CURLOPT_CONNECTTIMEOUT,10);//timeout curl_multi_add_handle ($mh,$conn[$i]); } do { $n=curl_multi_exec($mh,$active); } while ($active); foreach ($urls as $i => $url) { $res[$i]=curl_multi_getcontent($conn[$i]); curl_multi_remove_handle($mh,$conn[$i]); curl_close($conn[$i]); } curl_multi_close($mh); print_r($res); ?> I look forward to your ideas. hi, i have this xml <m time="2012-03-09T11:14:20+00:00" timestamp="1331291660"> <ma id="1219457" xsid="0"> <time>2012-03-09T19:30:00+00:00</time> <gru id="8388">Nacional</gru> <ht id="2325">Teste</ht> <at id="8919">Teste2</at> <results /> <mar did="6" name="Under"> <ofr id="95690814" n="2" ot="0" last_updated="2012-03-09T11:13:35+00:00" flags="1" bmoid="1000095485"> <ors i="0" time="2012-03-08T18:59:22+00:00" starting_time="2012-03-09T19:30:00+00:00"> <a1>4</a1> <a2>3.5</a2> <a3>4</a3> <a4>2</a4> <a5>8</a5> </ors> </ofr> </mar> </ma> </m> and this code: $DOMDocument = new DOMDocument( '1.0' , 'utf-8' ); $DOMDocument->preserveWhiteSpace = false; $DOMDocument->loadXML( $xml ); foreach ( $DOMDocument->getElementsByTagName( '*' ) as $Nodes ) { foreach ( $Nodes->getElementsByTagName( '*' ) as $Node ) { $Data[ $Node->parentNode->nodeName ][ $Node->nodeName ] = $Node->nodeValue; } with this code i can load the value of the a1 and a2 etc but i need load the name of the mar ( Under ) and the did. how can i do this? thanks I am trying to parse a XML file using SimpleXML, but I can't figure out how to display <Category> name and each <feature> Here's a example of the XML file. <Vehicle> <Description>To set an appointment callor email </Description> <Features> <Category Name="Comfort"> <Feature>Front Air Conditioning</Feature> <Feature>Front Air Conditioning Zones: Single</Feature> <Feature>Front Air Conditioning: Climate Control</Feature> </Category> </Features> </Vehicle> I need a little bit of help. I have a problem in which I need to read a file using fopen then open another file and look for a match and then take the information from both and enter it in a database then loop like 1000-2000 times. EG: Code: [Select] 8;"zip";"File1.zip";"post-36-10839578260.ibf";"318";"7129";"8cc3bac5f531206b9ca0414d08430b1e";"3485" 27;"zip";"File2.zip";"post-40-10840088850.ibf";"222";"7162";"6af0e0d5798485656a2fd78d75a86e6d";"60984" and find the matching text in another csv file. Also, there is like 1000+ lines. The easy think I could think of is to parse the information and put it into a table and search the table.. Code: [Select] <?php $dom = new DOMDocument(); $dom->loadHTMLFile('http://en.wikipedia.org/wiki/Liverpool_F.C.'); $domxpath = new DOMXPath($dom); foreach ($domxpath->query('//span[@id="Players"]/../following-sibling::table[1]//span[@class="fn"]') as $a) {echo " <p>$a->textContent</p> "; }; ?> Hello, how can I parse an XML that includes all of the $a->textContent with a tag like <player></player>? Hi All. I was working on this class for the last .. 2 hours or so. I was just wondering, if any of you could see anything immediatly wrong with it. I haven't really had chance to test it. If it does work. Help yourselves to it. <?php /* Made By Richard Clifford for the use and redistribution under the GPL 2 license. If you wish to use this class, please leave my name and this comment in here. */ /* Usage Example: $this->objXML->loadXMLDoc('Document.xml'); #Must start with This $this->objXML->getChildren(); $this->objXML->getChildrenByNode('NodeName'); $this->objXML->asXML(); $this->objXML->addAttribute('attrName','attrValue'); $this->objXML->addChild('ParentName','childName'); $this->objXML->getAttributesByNode('Attr'); $this->objXML->xmlToXPath(); $this->objXML->countChildren('parentNode'); */ class XML{ function __construct(){ //Calls the loadXMLDoc func $this->loadXMLDoc($docName); } /** @Brief Loads the XML doc to read from @Param docName The Name of the XML document @Param element The SimpleXMLElement, Default = null @Since Version 1. Dev @Return blOut Boolean */ public function loadXMLDoc($docName, $element = null){ //Set the Return value $blOut = false; //Load the XML Document $XML = simplexml_load_file($docName); //Call the SimpleXMLElement $SXE = new SimpleXMLElement($element); //Checks to make sure the file is loaded and the SimpleXMLElement has been called if( ($XML == TRUE) && ($SXE == TRUE)){ $blOut = true; }else{ die('Sorry, Something Went Wrong'); } //Return the Variable return $blOut; } /** @Brief Gets an XML Child Attribute @Since Version 1. Dev @Returns $strOut String from the Child */ public function getChildren(){ if(!$this->loadXMLDoc() ){ die('Couldn\'t Load XML Document');} $strOut = NULL; //Loads the XML Document $XML = $this->loadXMLDoc(); //Gets all of the Children in the Document $xmlChildren = $XML->children(); //Loops through the children to give each Child and its value. foreach($xmlChildren as $child){ $strOut = $child->getName() . ':' . $child . '<br />'; } return $strOut; } /** @Brief Return a well-formed XML string based on SimpleXML element @Since Version 1. Dev @Returns $strOut The Well Formed XML string */ public function asXML(){ if(!$this->loadXMLDoc() ){ die('Couldn\'t Load XML Document');} $strOut = NULL; //Loads XML DOC $strToXML = $this->loadXMLDoc(); //Calls the Element $SXE = new SimpleXMLElement($strToXML); if(!$SXE){ die('Oops, something went wrong'); }else{ //Set the Return Value as the XML $strOut = $xml->asXML(); } return $strOut; } /** @Brief Adds an attribute to the SimpleXML element @Since Version 1. Dev @Returns $blOut Boolean */ public function addAttribute($element, $attrName, $attrValue){ if(!$this->loadXMLDoc() ){ die('Couldn\'t Load XML Document');} $blOut = false; //loads the XML Doc and sets the SXE (SimpleXMLElement) $addAttr = $this->loadXMLDoc($this->docName, $element); $addNewAttr = $addAttr->addAttribute($attrName, $attrValue, $namespace = null); if(!$addNewAttr){ die('Sorry We could not add the attribute at this time.'); }else{ $blOut = true; } return $blOut; } /** @Brief Adds a child element to the XML node @Since Version 1. Dev @Returns $blOut Boolean */ public function addChild($childName, $childValue, $namespace = null){ if(!$this->loadXMLDoc() ){ die('Couldn\'t Load XML Document');} $blOut = false; //Loads the XML Doc $addChild = $this->loadXMLDoc(); //Adds a new child to the document $addNewChild = $addChild->addChild($childName, $childValue, $namespace); if(!$addNewChild){ die('Sorry, I couldn\'t addChild at this Time'); }else{ $blOut = true; } return $blOut; } /** @Brief Finds children of given node @Since Version 1. Dev @Param $nodeName The Name of the Node to retrieve Children @Param $prefix If is_prefix is TRUE, ns will be regarded as a prefix. If FALSE, ns will be regarded as a namespace URL. @Returns $arrOut Multi-Dimensional Array */ public function getChildrenByNode($nodeName, $prefix = true){ if(!$this->loadXMLDoc() ){ die('Couldn\'t Load XML Document');} $arrOut = array(); $xml = $this->loadXMLDoc(); $getChildren = $xml->children($nodeName); foreach($getChildren as $child){ $arrOut['child_name'] = array($child); } return $arrOut; } /** @Brief Gets an array of attributes from a node name @Since Version 1. Dev @Param $nodeName The name of a Node to get a */ public function getAttributesByNode($nodeName){ if(!$this->loadXMLDoc() ){ die('Couldn\'t Load XML Document');} $arrOut = array(); //Loads the XML Document $loadXml = $this->loadXMLDoc(); //Gets the XML code $xml = $loadXML->asXML(); //Loads the XML into a string $xmlString = simplexml_load_string($xml); //For each attribute in the XML Code string set as $attr=>$value foreach($xmlString->$nodeName->attributes() as $attr=>$value){ $arrOut = array($attr=>$value); } return $arrOut; } /** @COMPLETE AT A LATER DATE @NEED TO LOOKUP */ public function xmlToXPath(){ } /** @Brief Counts how many children are in a node @Param $nodeName The name of the node to query @Returns $strOut Counts the amount of children in a node, and adds it to a string. */ public function countChildren($nodeName, $return = 'int'){ if(!$this->loadXMLDoc() ){ die('Couldn\'t Load XML Document');} //Switch the $return as the user may want an integer or a string returned. switch($return){ case 'str': //if $strOut = NULL fails then user $strOut = ''; $strOut = NULL; //Load the XML Document $xml = $this->loadXmlDoc(); //Get the Children $xmlNode = $this->getChildrenByNode($nodeName); //Set a counter $counted = 0; //Forevery child in every node foreach($xmlNode as $child){ //add one to the counter $counted++; //return a string $strOut = sprintf('The %s has %s children',$xmlNode, $child); } return $strOut; break; case 'int': $intOut = 0; $xml = $this->loadXmlDoc(); $xmlNode = $this->getChildrenByNode($nodeName); $counted = 0; foreach($xmlNode as $child){ $counted++; } //Return the counter(int) (int) $intOut = $counted; return $intOut; break; } return $return; } } ?> Best Regards, Mantyy I was wonder how I could make some BBcode for my messaging system I made for a website. I made some simple ones like just by replace [ red ] with font color etc. I want to know how to do something like this: [ url = http://google . ca] Google [ / url ] without the spaces. ~AJ Im using some software called php html dom parser i wont to be able to keep the souce tidy i.e before dom parser <?php //////////////////////SEO TOOL/////////////////////////// $title = 'Green Deal Nationwide - PB Energy Solutions Ltd'; $description = 'Delivering all your environmental needs to \'green\' up your business, improve reputation, increase profitability and give a competitive advantage.'; /////////////////////////////////////////////////////// ?> <?php include('includes/settings.php'); ?> <?php include('includes/header.php'); ?> <div class="container"> <div id="large-page-img"> <img src="<?php echo URL(); ?>images/home-page-slide.jpg" width="911" height="230" /> <img src="<?php echo URL(); ?>images/home-page-slide-1.jpg" width="911" height="230" /> <img src="<?php echo URL(); ?>images/home-page-slide-2.jpg" width="911" height="230" /> </div> <div id="content-home"> <div class="iedit"> after dom parser saved to file <?php //////////////////////SEO TOOL/////////////////////////// $title = 'Green Deal Nationwide - PB Energy Solutions Ltd'; $description = 'Delivering all your environmental needs to \'green\' up your business, improve reputation, increase profitability and give a competitive advantage.'; /////////////////////////////////////////////////////// ?> <?php include('includes/settings.php'); ?> <?php include('includes/header.php'); ?> <div class="container"> <div id="large-page-img"> <img src="<?php echo URL(); ?>images/home-page-slide.jpg" width="911" height="230" /> <img src="<?php echo URL(); ?>images/home-page-slide-1.jpg" width="911" height="230" /> <img src="<?php echo URL(); ?>images/home-page-slide-2.jpg" width="911" height="230" /> </div> <div id="content-home"> <div class="iedit"><div class="iedit"> is there anyway i can keep it like the original fil after dom? I have an XML file (differences.xml) <element> <item code="lM" name="castle" type="building"> <cost>5000</cost> </item> ...more items.... </element> I have parser.php whih parses the XML and displays all item names in a checklist for the user to select items. These selected items are method=post back to parser.php which re-parses the XML where selected item = item name and SHOULD then display all selected items' names (and more): <?php if (isset($_GET['formSubmit'])) { $option = $_POST['option']; $option = array_values($option); if (!empty($option)){ $xml = simplexml_load_file('differences.xml'); $i = 0; $count = count($option); while($i < $count) { $selected = $xml->xpath("//item[@name='".$option[$i]."']"); echo $selected[$i]['name']; $i++; } }else{ echo "No items selected, Please select:"; } }else{ $xml = simplexml_load_file('differences.xml'); $object = $xml->xpath('//item'); $count = count($object); $i = 0; echo "<form method='POST' action='parser.php'>"; while($i < $count){ $xmlname = $object[$i]['name']; echo "<input type='checkbox' name='option[".$i."]' value='$xmlname'>".$xmlname; echo "<br>"; $i++; } echo "<br><input type='submit' name='formSubmit' value='Proceed'><br>"; echo "</form>"; } ?> The problem is, the parser is only displaying the first selected item and isn't outputting the rest. To clarify the code: - The first half executes only if items have been previously selected in the form. If no items were selected, it goes to the secod half, which parses the XML file and presents the checklist form for the user to select items. THIS WORKS GREAT. - After pressing the submit button, the items are sent using POST and the page is reloaded. THIS WORKS GREAT. - After reloading, the first half executes and finds the $_POST and resets the array so that it works with the coming loop. THIS WORKS GREAT. - The loop parses the XML and returns only those item names that have been selected in the $_POST. THIS DOES NOT WORK. So my question is: what's wrong with my code? The array $options has all the proper information, but the parser is only outputting the first item. All consecutive loops result in empty space with no output. I've been at this for 2 weeks and can't find the solution. Any help is appreciated! In the following code, I am getting an error: Parse error: syntax error, unexpected T_CONSTANT_ENCAPSED_STRING, expecting T_STRING or T_VARIABLE or '{' or '$' in XXX on line 235 Line 235 is the second line in the code snippet Code: [Select] $result = mysql_query( $querystr, $this->linkid ); $this->"queryid".$id = $result; if ( defined( "SWIFTDEBUG" ) ) { Thanks Dear all, Hello, i'm a newbie programmer. I want to implement the concept about recursive descent parser for natural language (english). The concept was found in http://nltk.sourceforge.net/doc/en/ch07.html#rdparser. But my problem is, how to implementation the concept/theory to PHP code ? Anyone would to help me to give the advice or tutorial or maybe anyone has to make script before? Thank you very much in advance. Everything works fine, unless I add this stupid thing to get rid of people using HTML Code: [Select] $text = pun_htmlspecialchars($text); Once I add that to my function, no bbcodes work at all? But I cant use html.. (which is good) but I need to beable to use BBCODE, and parse hackers from using html also, any help? MY CODE absolutely destroyed the forum page here it is: http://pastebin.com/jv7m47kn Hi guys! Am trying to get hold of a PHP Math Expression Parser that follows correct precedence and executes custom functions. Can somebody share his scripts? or lead me to the right directions? Hope you can help! Thanks a lot! My boss is gonna kill me soon. This is 1 week overdue already. Jon |