PHP - Curl Crawling Data
I'm using cURL to crawl and scrape data from a website. This website contains tables with rows of data. When I send a cURL POST for the underlying data at a specific row(A), it will return the expected data. But when I move to the second row(B), the data returns blank or specifically, a tons of spaces (or &#nbsp's.) When I access the cURL's POST location by browser, I can see (B)'s data. The only difference in the 2 POST's are location ID's for the data. I don't think it's a problem with JavaScript as I can successfully return data from row (A) as I mentioned.
Similar TutorialsHello everyone, I developed a php application that crawls a site and generates an xml sitemap with the gathered information. It works, but as of now I am using brute force tactics. I have a class that crawls and stores the links in a tree by returning the file_get_contents and using preg match to find the a tags. Is there a quicker method? I've seen people talking about cURL but i don't know if that will make my program any better. My application seems to get results a bit quicker than some others I have seen. My main concern comes with the sorting. Is there a way to tell if a link on a page is an rss feed or like a downloadable image or zip file or something? For files, I explode at '/' and check the last array key for a '.' , then I check it against an array of file names I think I want to include. For feeds I just check the explode array for feed, feeds , rss or ?feed=rss2 is in the array before storage. This works fine for sites I administer and wordpress sites, but it could filter out a cooking site link or something with a feed directory. It also seems like it is one of the most time consuming parts. I think what I am trying to ask... is there a good way to filter these results? Will cURL or anything else let me check for actual pages and filter out .mp3 files and all the other junk you don't want in a sitemap? Thanks in advance for your time. Hello, I had made a website (PHP) for a music company few years back. They basically sell songs from their site. Now they've faced a problem. Their site has been crawled by abmp3.com and they not only let user download songs from our site but also give a full path of a song to download. I wonder how this happened and how to stop them from crawling our site. And its likely some other site doing same too. So please anybody help me how to overcome this problem, may be some PHP code can do this??. Thanks watsmyname I am trying to go to http://lirr42.mta.info/ and get the data from the submitted form. I have rebuilt the form exactly as needed, and done a few adjustments. The posts that I am sending from MY form, are the same as the one on the site. This site allows public data mining by the way. So does someone seem something wrong with my code? I know the basics are working. I have another page that is working fine. It's just this one ends up returning something..it returns part of the data but it's something wrong with it. Any advice is appreciated. Code: [Select] <?php define( 'DEBUG', true ); define( 'DEFAULT_STOP', 'Broadway' ); // report all errors during development/debug mode if ( DEBUG ) error_reporting( E_ALL ); else error_reporting( 0 ); include( 'simplehtmldom/simple_html_dom.php' ); $mta_post_url = 'http://lirr42.mta.info/schedules.php'; // GTFS Specification File Definitions $gtfs_files = array ( 'agency' => 'agency.txt', // required 'stops' => 'stops.txt', // required 'routes' => 'routes.txt', // required 'trips' => 'trips.txt', // required 'stop_times' => 'stop_times.txt', // required 'calendar' => 'calendar.txt', // required 'calendar_dates' => 'calendar_dates.txt', 'fare_rules' => 'fare_rules.txt', 'fare_attributes' => 'fare_attributes.txt', 'shapes' => 'shapes.txt', 'frequencies' => 'frequencies.txt', 'transfers' => 'transfers.txt' ); $gtfs_pickup_codes = array ( 0 => 'Regularly scheduled pickup', 1 => 'No pickup available', 2 => 'Must phone agency to arrange pickup', 3 => 'Must coordinate with driver to arrange pickup' ); $gtfs_dropoff_codes = array ( 0 => 'Regularly scheduled drop off', 1 => 'No drop off available', 2 => 'Must phone agency to arrange drop off', 3 => 'Must coordinate with driver to arrange drop off' ); // load stops file $stopsFile = fopen( 'data/lir/' . $gtfs_files[ 'stops' ], "r" ); if ( $stopsFile ) { // get (and toss) header row $stopsHeader = fgetcsv( $stopsFile, 1000 ); // print_r( $stopsHeader ); // will hold HTML output for Select dropdown for stops $stopSelectOptions = ''; // build array from file data while ( $data = fgetcsv( $stopsFile, 1000 ) ) { $stopsData[ $data[1] ] = $data[0]; } // get a sorted array of the stop names ( yes this could be done with array_multisort or usort but I didnj't feel like it right now ) $stopNames = array_keys( $stopsData ); sort( $stopNames ); // loop through sorted array and create SELECT options with default SELECTED at Grand Central Terminal foreach( $stopNames as $stop ) $stopSelectOptions .= sprintf( '<option %s value="%d">%s</option>', ( DEFAULT_STOP == $stop ) ? 'selected="selected"' : '' , $stopsData[ $stop ], $stop ) . "\n\r"; } fclose( $stopsFile ); // load ads file $adsFile = fopen( 'ads.txt', "r" ); if ( $adsFile ) { // get (and toss) header row $adsHeader = fgetcsv( $adsFile, 1000 ); // will hold ads available for stops $ads = array(); // build array from file data while ( $data = fgetcsv( $adsFile, 1000 ) ) { $ads[ $data[0] ] = array ( 'filename' => $data[1], 'url' => $data[2], 'text' => $data[3] ); } fclose( $adsFile ); } // printout the html header require_once( 'header.php' ); if ( 'POST' == $_SERVER[ 'REQUEST_METHOD' ] ) { $from_stop = $_POST[ 'FromStation' ]; // @todo Filter! http://www.php.net/manual/en/filter.filters.validate.php $orig_station = $from_stop; $to_stop = $_POST[ 'ToStation' ]; // @todo Filter! http://www.php.net/manual/en/filter.filters.validate.php $dest_station = $to_stop; if ( $to_stop == $from_stop ) { $error = 'Originating and Destination stops are the same.'; } else { print_ad( $orig_station, $dest_station, $location='top' ); $travel_date = $_POST[ 'RequestDate' ]; $requestTime = $_POST[ 'RequestTime' ]; $am_pm = $_POST[ 'RequestAMPM' ]; $filter = $_POST[ 'sortBy' ]; /* * Lets post this to MTA.info */ $mta_curl = curl_init(); $mta_curl_options = array ( CURLOPT_FAILONERROR => true, CURLOPT_FOLLOWLOCATION => false, // CURLOPT_MUTE => true, CURLOPT_POST => true, CURLOPT_RETURNTRANSFER => true, CURLOPT_CONNECTTIMEOUT => 30, CURLOPT_TIMEOUT => 30, CURLOPT_URL => $mta_post_url, CURLOPT_REFERER => 'http://lirr42.mta.info/', CURLOPT_USERAGENT => 'MTA Info Scrape/1.0', CURLOPT_POSTFIELDS => http_build_query( $_POST ) ); curl_setopt_array( $mta_curl, $mta_curl_options ); $mta_response = curl_exec( $mta_curl ); if ( false === $mta_response ) { curl_close( $mta_curl ); die( sprintf( 'Response Code:%s, Curl Error No:%s, Curl Error Message:%s', curl_getinfo( $mta_curl, CURLINFO_HTTP_CODE ),curl_errno( $mta_curl ), curl_error( $mta_curl ) ) ); } else { curl_close( $mta_curl ); echo '<pre>' . htmlentities( $mta_response ) . '</pre>'; $html = new simple_html_dom(); $html->load( $mta_response ); $schedule_table = $html->find( 'table', 0 ); // find second table in the response $row_count = 0; // will hold HTML output for table $stopSchedule = '<table><th>Departs</th><th>Arrives</th><th>Minutes</th><th>Transfer</th><th>Fare</th></tr>'; foreach ( $schedule_table->find('tr') as $schedule_row ) { $row_count++; // check for and skip header row if ( 1 == $row_count ) continue; // check for and skip last row which first TD has rowspan attribute if ( $schedule_row->children(0)->colspan ) { continue; } // extract table data for this row $depart_time = $schedule_row->children( 0 )->innertext; $arrive_time = $schedule_row->children( 2 )->innertext; $minutes_traveled = $schedule_row->children( 4 )->innertext; $transfer = $schedule_row->children( 5 )->innertext; $fare = $schedule_row->children( 6 )->plaintext; // add table row to html output $stopSchedule .= sprintf( '<tr><td>%s</td><td>%s</td><td>%s</td><td>%s</td><td>%s</td></tr>', $depart_time, $arrive_time, $minutes_traveled, $transfer, $fare ) . "\n\r"; } // end foreach schedule_row // add table end tag to html output $stopSchedule .= '</table>'; // output table to browser echo $stopSchedule; print_ad( $orig_station, $dest_station, $location='bottom' ); } // end if mta response } // end if not same orig and dest } // end if POST if ( 'GET' == $_SERVER[ 'REQUEST_METHOD' ] || ! empty( $error ) ) { print_ad( null, null, $location = 'default' ); ?> <form method="post"> <?php if ( ! empty ( $error ) ) { echo sprintf( '<div class="error">%s</div>', $error ); } ?> From:<br /> <select id="orig_station" name="FromStation" tabindex="2"> <?php echo $stopSelectOptions; ?> </select> <br /> <br /> To:<br /> <select id="dest_station" name="ToStation" tabindex="3"> <?php echo $stopSelectOptions; ?> </select> <br /> <br /> <input id="date1" name="RequestDate" size="12" maxlength="10" tabindex="4" value="<?php echo date( 'm/d/Y' ); ?>" /> <?php $currentHour = trim( date( 'h' ) ); $currentMinutes = trim( date( 'i' ) ); $currentMinutesFloor = sprintf( '%02d' , $currentMinutes - ( $currentMinutes % 30 ) ); $currentAMPM = trim ( date( 'A' ) ); $start_time = strtotime( '1:00 AM' ); $end_time = strtotime( '1:00 PM' ); ?> <select name="RequestTime" tabindex="5"> <?php while( $start_time < $end_time ) { $option_time = date( 'h:i', $start_time ); $option_hour = trim( date( 'h', $start_time ) ); $option_minutes = trim( date( 'i', $start_time ) ); $option_selected = ( $option_hour == $currentHour && $option_minutes == $currentMinutesFloor ); echo sprintf( '<option %s>%s</option>', ( $option_selected ) ? 'selected="selected"' : '', $option_time ) . "\r\n"; $start_time = $start_time + 30*60; } ?> </select> <select name="RequestAMPM" tabindex="6"> <option value="AM" <?php echo ( 'AM' == $currentAMPM ) ? 'selected="SELECTED"' : ''; ?>>AM</option> <option value="PM" <?php echo ( 'PM' == $currentAMPM ) ? 'selected="SELECTED"' : ''; ?>>PM</option> </select> <input name="sortBy" type="hidden" value="1" /> <input type="submit" name="submit" value="Check Schedule" /> </form> <!-- images/base/ ic_menu_back.png ic_menu_home.png --> <?php } require_once( 'footer.php' ); ?> OK, I have the initial cURL working but need to figure out how to extract data I want off that webpage to display or store in a database, I tried using dom and xpath, but because of the way the page displays using css, i think its not picking it up. Here is my cURL script: <?php $userAgent = 'Googlebot/2.1 (http://www.googlebot.com/bot.html)'; $target_url = "www.test.com"; $ch = curl_init(); curl_setopt($ch, CURLOPT_USERAGENT, $userAgent); curl_setopt($ch, CURLOPT_URL,$target_url); curl_setopt($ch, CURLOPT_FAILONERROR, true); curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true); curl_setopt($ch, CURLOPT_AUTOREFERER, true); curl_setopt($ch, CURLOPT_RETURNTRANSFER,true); curl_setopt($ch, CURLOPT_TIMEOUT, 10); $html = curl_exec($ch); if (!$html) { echo "<br />cURL error number:" .curl_errno($ch); echo "<br />cURL error:" . curl_error($ch); exit; } // parse the html into a DOMDocument $dom = new DOMDocument(); $dom->loadHTML($html); // grab all the on the page $xpath = new DOMXPath($dom); $hrefs = $xpath->evaluate("/html/body//td"); for ($i = 0; $i < $hrefs->length; $i++) { $href = $hrefs->item($i); $url = $href->getAttribute('href'); storeLink($url,$target_url); echo "<br />Link stored: $url"; } ?> and here is a snippet of the source of the page I am getting: <span id="lblTest"><h1 id='surrZipTitle'>Agents in Surrounding Zip Codes</h1><table cellpadding='0' cellspacing='0' border='0' class='tblDent'><tr><td class='tdEliteTitle'><span class='caaSubHead3 addwidth'>H.K. Dent Elite</span></td></tr><tr><td class='tdEliteContent'><table cellpadding='0' cellspacing='0' border='0'><tr><td valign='top'><span class='caaAgencyName2 addwidth'>PROFESSIONAL INS ASSOC, INC.</span></td><td valign='top'> </td></tr></table><table cellpadding='0' cellspacing='0' border='0'><tr><td width='360px' valign='top'><div class='addressBlock'><span>4444 MANZANITA AVE STE 6</span><br /><span>CARMICHAEL , CA 95608-1488</span><br /><a class='faaBlueLink' id='lnkContact' href='http://www.safeco.com/portal/server.pt/gateway/PTARGS_0_20656_395_362_0_43/http%3B/por-portlets-prd.int.apps.safeco.com%3B13425/dotcom/FindAnAgent/find-an-agent/contactanagent.aspx?RequestType=agency&level=elite&Id=0415199904150295&lat=38.646142&lng=-121.327623' onclick='oOobj4.Preferences.Plugins.Events.poX=0;'>Contact & Directions</a> <a class='faaBlueLink' id='lnkWebSite' style='display: none;' href='http://' target='_blank' onclick="return trackEvent('/External-Link/AgentWebsite/ ','PROFESSIONAL INS ASSOC, INC. ');">Website</a></div></td><td valign='top'> </td></tr></table></td></tr></table><table cellpadding='0' cellspacing='0' border='0' class='tblDent'><tr><td class='tdEliteTitle'><span class='caaSubHead3 addwidth'>H.K. Dent Elite</span></td></tr><tr><td class='tdEliteContent'><table cellpadding='0' cellspacing='0' border='0'><tr><td valign='top'><span class='caaAgencyName2 addwidth'>AMERICAN AIM AUTO INS AGY, INC</span></td><td valign='top'> </td></tr></table><table cellpadding='0' cellspacing='0' border='0'><tr><td width='360px' valign='top'><div class='addressBlock'><span>5339 SAN JUAN AVE</span><br /><span>FAIR OAKS , CA 95628-3318</span><br /><a class='faaBlueLink' id='lnkContact' href='http://www.safeco.com/portal/server.pt/gateway/PTARGS_0_20656_395_362_0_43/http%3B/por-portlets-prd.int.apps.safeco.com%3B13425/dotcom/FindAnAgent/find-an-agent/contactanagent.aspx?RequestType=agency&level=elite&Id=0415911704151222&lat=38.66237&lng=-121.292429' onclick='oOobj4.Preferences.Plugins.Events.poX=0;'>Contact & Directions</a> So basically I want to extract the agency name like "<span class='caaAgencyName2 addwidth'>PROFESSIONAL INS ASSOC, INC.</span>" and the address which always use the same div class like "caaAgencyName2" and "addressBlock". How can this be accomplished? First of all... this is a GREAT forum. Been lurking around here for a long time and searched all over but still can't find a solution to a problem I'm facing. Here's my cURL code - I am basically retrieving data from a web site: <?php $curl_handle=curl_init(); curl_setopt($curl_handle,CURLOPT_URL,'http://myprimemobile.com/update.html'); curl_setopt($curl_handle,CURLOPT_CONNECTTIMEOUT,2); curl_setopt($curl_handle,CURLOPT_RETURNTRANSFER,1); $buffer = curl_exec($curl_handle); curl_close($curl_handle); if (empty($buffer)) { print "Connection Timeout<p>"; } else { print $buffer; } ?> If you take a look at the site http://myprimemobile.com/update.html -- it will begin to update new information. I want to basically retrieve the data from that page but in my own way. I want to loop through the page and search for all the names, addresses, and phone numbers only (not store hours or anything else), and want to put it in a variable so I can create a table with all that information on my page. Based on the code, all the information is stored in the $buffer variable. I want to break this variable down to extract certain data in a 'div' or a 'table'. Please help me out. Thanks! Hi there, OK, there is what I'm trying to accomplish - I want to send POST instantly when the page is loaded without "Submit" button or anything like that. Here how it looks using JavaScript: Code: [Select] <form name="postdata" action="http://www.mywebsite.com/post.php" method="post"> <input name="data" type="text" value="1"/> </form> <script type="text/javascript" language="JavaScript"> document.postdata.submit(); </script> Works fine with JavaScript, however i can't use it. Here is reference for CURL: http://php.net/manual/en/book.curl.php I found this curl_post function at php.net to simplify things up. <?php /** * Send a POST requst using cURL * @param string $url to request * @param array $post values to send * @param array $options for cURL * @return string */ function curl_post($url, array $post = NULL, array $options = array()) { $defaults = array( CURLOPT_POST => 1, CURLOPT_HEADER => 0, CURLOPT_URL => $url, CURLOPT_FRESH_CONNECT => 1, CURLOPT_RETURNTRANSFER => 1, CURLOPT_FORBID_REUSE => 1, CURLOPT_TIMEOUT => 4, CURLOPT_POSTFIELDS => http_build_query($post) ); $ch = curl_init(); curl_setopt_array($ch, ($options + $defaults)); if( ! $result = curl_exec($ch)) { trigger_error(curl_error($ch)); } curl_close($ch); return $result; } $test = curl_post("http://www.mywebsite.com/post.php", array("data" => "1")); print($test); ?> Looks like it uses GET instead POST (used sniffer to verify this). I'm trying to send POST data to other host, if that matters. Any help would be appreciated. Thanks in advance. I have json code and i want to prine only {"currency":"DOGE","available":"0","reserved":"420.000000000"} data. my json data [{"currency":"1ST","available":"0","reserved":"0"},{"currency":"DOGE","available":"0","reserved":"420.000000000"},{"currency":"ZSC","available":"0","reserved":"0"}] I use this code
<?php ?> ok i have been trying for over a week now, and i just relearning php again after a few years of being lazy. lol here is a code snippet. Quote //prep the members $myFile2 = "members.txt"; $fh2 = fopen($myFile2, 'r'); $theData2 = fread($fh2, filesize($myFile2)); fclose($fh2); //echo $theData2; //grab the stats! $url = 'http://link'; $postdata = 'players='.$theData2.'&fields=all'; $ch = curl_init($url); curl_setopt($ch, CURLOPT_POST, true); curl_setopt($ch, CURLOPT_RETURNTRANSFER, true); curl_setopt($ch, CURLOPT_POSTFIELDS, $postdata); $data = curl_exec($ch); curl_close($ch); $data = json_decode($data,true); the orininal code Quote //prep the members $members = "member,member1,member two,member blue"; //grab the stats! $url = 'http://link'; $postdata = 'players='.$members.'&fields=all'; $ch = curl_init($url); curl_setopt($ch, CURLOPT_POST, true); curl_setopt($ch, CURLOPT_RETURNTRANSFER, true); curl_setopt($ch, CURLOPT_POSTFIELDS, $postdata); $data = curl_exec($ch); curl_close($ch); $data = json_decode($data,true); now the problium i am having is that with the original way, listing the members indevidualy is not the way i would like it, it would be easyer if it was in .txt form so i can add top it easyer. Also i have 300 people to add and the string doesnt seem to like that many people in it.....so i made a txt file, and it all works except that when i load the page it does not display the members stats from the external link. each players stats gets posted on the page and sorted who is ranked highest. echo works for echoing the contents of the file. however thats usless to me as i want the contents of the file to work like the orininal code. sorry if its confusing.... basicaly there is data from the link for each player asociated with there name. which is seperated by a , in the code. so each time a , is reached the code repeats its self posting each player till there are no more to post. the string is to short for me or just seems to not work with 300 people. i need it to read the external .txt file or something simular and work in the same mannor. I'm trying to send a HTTPS POST request with XML data to a server using PHP. Anything sends to the server requires authentication therefore I'll use cURL. Some background info.: the XML data is to request the server to upload a file from a specific URL to its local storage. One rule of using this API is I MUST set the content type for each request to application/xml. This is what I've done but isn't working... <?php $fields = array( 'data'=>'<n1:asset xmlns:n1="http://.....com/"><title>AAAA</title><fileName>http://192.168.11.30:8080/xxx.html</fileName><description>AAAA_desc</description><fileType>HTML</fileType></n1:asset>' ); $fields_string = ""; foreach($fields as $key=>$value) { $fields_string .= $key.'='.$value.'&'; } rtrim($fields_string,'&'); $url = "https://192.168.11.41:8443/"; $ch = curl_init(); curl_setopt($ch, CURLOPT_URL, $url); curl_setopt($ch, CURLOPT_POST, count($fields)); curl_setopt($ch, CURLOPT_POSTFIELDS, $fields_string); curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false); curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, false); curl_setopt($ch, CURLOPT_HEADER, true); curl_setopt($ch, CURLOPT_RETURNTRANSFER, true); curl_setopt($ch, CURLOPT_USERPWD, "admin:12345678"); curl_setopt($ch, CURLOPT_HTTPHEADER, array('Content-type: application/xml', 'Content-length: '. strlen($fields_string)) ); $result = curl_exec( $ch ); curl_close($ch); echo $result; ?> I am expected to get an XML reply of either upload successful or upload failed. But instead I am getting this error message. HTTP/1.1 415 Unsupported Media Type Server: Apache-Coyote/1.1 Content-Type: text/xml;charset=UTF-8 Content-Length: 0 Date: Thu, 02 Dec 2010 03:02:33 GMT I'm sure the file type is correct, the XML format is correct. I've tried urlencode the fields but it didn't work. What else I might have done wrong? I've looked at various cURL tutorials and the PHP manual, but I don't see what I'm doing wrong here. My goal is to be able to send data to test.php and then have that page run a MySQL query to insert the TITLE and MESSAGE into the database. Any comments? :/ post.php <?php if(isset($_GET['title']) && isset($_GET['message']) && isset($_GET['times'])) { $array = array('title' => urlencode($_GET['title']), 'message' => urlencode($_GET['message'])); foreach($array as $key => $value) { $fields = $key.'='. $value .'&'; } rtrims($felds); //set our init handle $curl_init = curl_init(); //set our URL curl_setopt($curl_init, CURLOPT_URL, 'some url'); //set the number of fields we're sending curl_setopt($curl_init ,CURLOPT_POST, count($array)); for($i = 0; $i < $_GET['times']; $i++) { //send the POST data curl_setopt($curl_init, CURLOPT_POSTFIELDS, $fields); curl_exec($curl_init); echo 'post #'.$i. ' sent<br/>'; } //complete echo '<b>COMPLETE</b>'; //close session curl_close($curl_init); } else { ?> <form action="post.php" method="GET"> <table> <tr><td>Title</td><td><input type="text" name="title" maxlength="30"></td></tr> <tr><td>Content</td><td><textarea name="message" rows="20" cols="45" maxlength="2000"></textarea></td></tr> <tr><td>Process Amount</td><td><input type="text" name="times" size="5" maxlength="3"></td></tr> <tr><td>START</td><td><input type="submit" value="Initiate Posting"></td></tr> </table> </form> <?php } ?> test.php <?php mysql_connect('host', 'user', 'pass'); mysql_select_db('somedb'); if($_POST['title'] && $_POST['message']) { mysql_insert("INSERT INTO tests VALUES (null, '{$_POST['title']}', '{$_POST['message']}')"); } echo '<hr><br/>'; //query $query = mysql_query("SELECT * FROM tests ORDER BY id DESC"); while($row = mysql_fetch_assoc($query)) { echo $row['id'].'TITLE: '. $row['title'] .'<br/>MESSAGE: '. $row['message'] .'<br/><br/>'; } ?> I'm getting data from a remote XML file using the following as a result of a form: Code: [Select] <?php header('Content-type: text/xml'); //extract data from the post extract($_POST); //set POST variables $url = 'https://www.*******'; $fields = array( 'ESERIES_FORM_ID'=>urlencode($ESERIES_FORM_ID), 'MXIN_USERNAME'=>urlencode($MXIN_USERNAME), 'MXIN_PASSWORD'=>urlencode($MXIN_PASSWORD), 'MXIN_VRM'=>urlencode($MXIN_VRM), 'MXIN_TRANSACTIONTYPE'=>urlencode($MXIN_TRANSACTIONTYPE), 'MXIN_PAYMENTCOLLECTIONTYPE'=>urlencode($MXIN_PAYMENTCOLLECTIONTYPE), 'MXIN_CAPCODE'=>urlencode($MXIN_CAPCODE) ); //url-ify the data for the POST foreach($fields as $key=>$value) { $fields_string .= $key.'='.$value.'&'; } rtrim($fields_string,'&'); //open connection $ch = curl_init(); //set the url, number of POST vars, POST data curl_setopt($ch,CURLOPT_URL,$url); curl_setopt($ch,CURLOPT_POST,count($fields)); curl_setopt($ch,CURLOPT_POSTFIELDS,$fields_string); //execute post $result = curl_exec($ch); //close connection curl_close($ch); ?> Which is displaying what was the remote XML page, but has brought it to my domain. How would I go about taking the results of the displaying XML page and displaying them on the next page rather than just displaying the XML data? Many thanks in advanced! I have looked up tutorials on Google. I am still confused. I built a php script that receives the data from my html form then filters out bad words. I am trying to get the script to resend the data to pm.cgi. The form was originally targeting pm.cgi but it needed filtered. This is why i am using php in the first place. Please help me pass these variables from the php script to the cgi script the same way that a html form would. Thank you. Here is my code, it prints correctly. I cannot get cURL to work. <?php $action = $_GET['action'] ; $login = $_REQUEST['login'] ; $ID = $_REQUEST['ID'] ; $text = $_REQUEST['Unused10'] ; $submit = $_REQUEST['submit'] ; function filterBadWords($str){ // words to filter $badwords=array( "badword1", "badword2"); // replace filtered words with random spaces $replacements=array( "****", "***", "****" ); for($i=0;$i < sizeof($badwords);$i++){ srand((double)microtime()*1000000); $rand_key = (rand()%sizeof($replacements)); $str=eregi_replace($badwords[$i], $replacements[$rand_key], $str); } return $str; } $text = filterBadWords($text); print_r($action); print_r($login); print_r($ID); print_r($text); print_r($submit); ?> I need to send the variables that i printed above to pm.cgi. Can you point me in the right direction? Thanks again! Friends, I want to check a site if there is any data avaliable for a given Domain Name.... For example Quote http://www.keywordspy.com/research/search.aspx?tab=domain-organic&market=uk&q=abc.co.uk In this URl the Domain is the "&q=" POST Parameter which is: abc.co.uk In the Output, Observe the Columns Keyword, Position, Volume etc.... So for any Domain we want to check if there is one or more keyword returned. Here is the Example of a Domain for which there is no Data: Quote http://www.keywordspy.com/research/search.aspx?tab=domain-organic&market=uk&q=notfoundforme.co.uk As you Observe there is no data in Keyword, Position or Volume Columns for this Domain (i.e. notfoundforme.co.uk) All i want is to Echo All the Keywords, their Position and Volumes for any domain for which data is avaliable. Code: Here is the Code i have which scrapes the Whole data with Curl, but i am not able to make getkwspydata function work to get the Keyword, Position or Volume data for the domain, i know some regex and preg_match needs to be done but its too technical for me, may someone help me with this? <?php function getkwspydata($host) { $request = "http://www.google.com/search?q=" . urlencode("site:" . $host) . "&hl=en"; $request = "http://www.keywordspy.com/research/search.aspx?tab=domain-organic&market=uk&q=". urldecode($host); $data = getPageData($request); // preg_match('/<div id=resultStats>(About )?([\d,]+) result/si', $data, $p); // $value = ($p[2]) ? $p[2] : "n/a"; // $string = "<a href=\"" . $request . "\">" . $value . "</a>"; //return $string; print_r($data); } function getDomainName($host) { $hostparts = explode('.', $host); // split host name to parts $num = count($hostparts); // get parts number if(preg_match('/^(ac|arpa|biz|co|com|edu|gov|info|int|me|mil|mobi|museum|name|net|org|pp|tv)$/i', $hostparts[$num-2])) { // for ccTLDs like .co.uk etc. $domain = $hostparts[$num-3] . '.' . $hostparts[$num-2] . '.' . $hostparts[$num-1]; } else { $domain = $hostparts[$num-2] . '.' . $hostparts[$num-1]; } return $domain; } function getPageData($url) { if(function_exists('curl_init')) { $ch = curl_init($url); // initialize curl with given url curl_setopt($ch, CURLOPT_USERAGENT, $_SERVER['HTTP_USER_AGENT']); // add useragent curl_setopt($ch, CURLOPT_RETURNTRANSFER, true); // write the response to a variable if((ini_get('open_basedir') == '') && (ini_get('safe_mode') == 'Off')) { curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true); // follow redirects if any } curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 5); // max. seconds to execute curl_setopt($ch, CURLOPT_FAILONERROR, 1); // stop when it encounters an error return @curl_exec($ch); } else { return @file_get_contents($url); } } $sitehost = ($_POST['sitehost']) ? $_POST['sitehost'] : $_SERVER['HTTP_HOST']; $sitedomain = getDomainName($sitehost); ?> <html> <head> <title>SEO Report for <?=$sitedomain;?></title> </head> <body> <form method="post" action="<?$_SERVER['PHP_SELF'];?>"> <p><b>Domain/Host Name:</b> <input type="text" name="sitehost" size='30' maxlength='50' value="<?=$sitehost;?>"> <input type="submit" value="Grab Details"></p> </form> <ul> <li>Google indexed pages: <?=getkwspydata($sitehost);?></li> </ul> </body> </html> Cheers Natasha T. Hi guys Can someone help me about this: The php code can be revise username and password with CURL then check database and if username & password is correct return true else false. Thanks This topic has been moved to PHP Regex. http://www.phpfreaks.com/forums/index.php?topic=334273.0 good day dear community, i am workin on a Curl loop to fetch multiple pages: i have some examples - and a question: Example: If we want to get information from 3 sites with CURL we can do it like so: $list[1] = "http://www.example1.com"; $list[2] = "ftp://example.com"; $list[3] = "http://www.example2.com"; After creating the list of links we should initialize the cURL multi handle and adding the cURL handles. $curlHandle = curl_multi_init(); for ($i = 1;$i <= 3; $i++) $curl[$i] = addHandle($curlHandle,$list[$i]); Now we should execute the cURL multi handle retrive the content from the sub handles that we added to the cURL multi handle. ExecHandle($curlHandle); for ($i = 1;$i <= 3; $i++) { $text[$i] = curl_multi_getcontent ($curl[$i]); echo $text[$i]; } In the end we should release the handles from the cURL multi handle by calling curl_multi_remove_handle and close the cURL multi handle! If we want to another Fetch of sites with cURL-Multi - since this is the most pretty way to do it! Well I am not sure bout the string concatenation. How to do it - Note I want to fetch several hundred pages: see the some details for this target-server sites - /(I have to create a loop over several hundred sites). * siteone.example/?show_subsite=9009 * siteone.example/?show_subsite=9742 * siteone.example/?show_subsite=9871 .... and so on and so forth Question: How to appy this loop into the array of the curl-multi? <?php /************************************\ * Multi interface in PHP with curl * * Requires PHP 5.0, Apache 2.0 and * * Curl * ************************************* * Writen By Cyborg 19671897 * * Bugfixed by Jeremy Ellman * \***********************************/ $urls = array( "siteone", "sitetwo", "sitethree" ); $mh = curl_multi_init(); foreach ($urls as $i => $url) { $conn[$i]=curl_init($url); curl_setopt($conn[$i],CURLOPT_RETURNTRANSFER,1);//return data as string curl_setopt($conn[$i],CURLOPT_FOLLOWLOCATION,1);//follow redirects curl_setopt($conn[$i],CURLOPT_MAXREDIRS,2);//maximum redirects curl_setopt($conn[$i],CURLOPT_CONNECTTIMEOUT,10);//timeout curl_multi_add_handle ($mh,$conn[$i]); } do { $n=curl_multi_exec($mh,$active); } while ($active); foreach ($urls as $i => $url) { $res[$i]=curl_multi_getcontent($conn[$i]); curl_multi_remove_handle($mh,$conn[$i]); curl_close($conn[$i]); } curl_multi_close($mh); print_r($res); ?> I look forward to your ideas. I am having problems being able to find the data that needs to be posted in various translation websites. I think this is because the translation tools they have are using some kind of flash script to translate? So the new page isn't being loaded using the post data string?? I am not completely sure, anyways... I am using live http headers on firefox to try and get the content for the post data string. The sites I have tried to get the post data from are these: http://www.freetranslation.com/ http://www.free-translator.com/ I can get cURL to visit the page and do everything, I just cant find the post data string. or maybe I am way off here.. Thanks for any help $post='{"cart_items":[{"configuration":{"price":100,"recharge_number":"9999999999"},"product_id":"999","qty":1}]}';i try this n reslut was :There are no valid items in cart: help me plz Edited by ShivaGupta, 30 November 2014 - 01:11 AM. hy all long time no see recently i did find this problem i'm trying to login to a webpage with curl ,but without success ,ofcourse the basic posting is not the problem xD but the page have a hidden field (for security) so everytime you load the page it changes so post wont work if you first use get to take the page and extract the hash from it. the page contains something like input type= hidden value=A RANDOM VALUE THAT CHANGES EVERYTIME without that value passed to post it wont login. anyone knows how can i pass it so the post will be valid? thank you I am trying to send the following curl command using PHP but I have so far been unsuccessful, any help would be appreciated :-) curl -u 1fd7b89be6f3533d7:X -H 'Content-Type:application/xml' -d '<message><body>Test using the API</body></message>' http://someurl.com The purpose of this script is to automatically send a message to a campfire chatroom for a PHPUnit script using Selenium |