PHP - Scraping Data From A Website Html Source (with Vb Example)
Hi, I'm trying to retrieve/scrape some information from a website using the class name and the tag name.
Below is the example in VB:
Dim htmL_cat As HTMLDocument Dim objTableL_cat As Object, objDatL_cat As Object, objItemL_cat As Object, objKeyL_cat As Object Dim intRowL_cat As Long Set htmL_cat = New HTMLDocument With CreateObject("MSXML2.XMLHTTP") .Open "GET", "http://www.lelong.com.my/Auc/List/BrowseAll.asp", False .send htmL_cat.body.innerHTML = .responseText End With With htmL_cat Set objTableL_cat = .getElementsByClassName("CatLevel1") 'Find elements with class name first For Each objDatL_cat In objTableL_cat Set objKeyL_cat = objDatL_cat.getElementsByTagName("a") 'Next, find elements with tag name For Each objItemL_cat In objKeyL_cat Sheets("Analytics").Range("E6").Offset(intRowL_cat, 0) = objItemL_cat.innerText intRowL_cat = intRowL_cat + 1 Next Next End With Set htmL_cat = Nothing Set objTableL_cat = Nothing Set objKeyL_cat = NothingHow do I do the same using PHP? Thanks. Similar TutorialsMore information on the job posting. I am looking to fetch information from daily deal website, Such as tuango.ca, socialliving.com, groupon.com...ect I want to retrieve data from different daily deal sites, and I want to retrieve all the deals of the day from each different city in the website. For example www.tuango.ca Has a deal a day in Montreal, Toronto,...ect I want to be apply to retrieve data from all the different location within the site. I want the script to fetch the data of deals. To be more clear I want the script to fetch What site the deal was on What location was it for What's the tittle of the deal What price is the deal What's the value of the deal What's the saving in percentage of the deal How much were sold What's the minimum amount of the deal before it becomes activated What's the company who did the deal Company address Company postal code Company phone number (there might be more categories..will talk more if you pass this stage of the interview process) Ones all this data is fetched I need it to automatically be store in a database. Every morning at 4:am (eastern time) I need it to run the script, because the days deals finish at midnight and it's the only way of getting a number of the total number of coupons sold. you'll usually see the final stats of the deal on their recent deals page of the website. I want to know how a site like http://onespout.com/deals/montreal did it.. I'm not asking somebody to do it for me I'm just asking someone to guide me in takeing the right steps I am writing a sql dump file and some of my fields have ' in it. Like the name is "Joe's Cake Shop". How should i add ' infront of ' to make it look like Joe''s Cake Shop.Also, I got an idea about adding ' infront of ' by seeing other database dump.Can someone please enlighten me why should i do it. My Code :- Code: [Select] <?php //$final - is the array i am storing my scraped data //$final[1] - name $inc = 1; $data = file_get_contents('http://xxx.com'); $regex = '~<td\s+colspan="2"\s+width="350"><font\s+size="2">\s+<b>\s+(.*?) <\/b><br>(.*?) <br>(.*?),\s+(.*?)\s+<br>(.*?), (.*?)\s+<BR><BR><font\s+size="2"><img\s+src="\.\.\/images\/phone1.gif"\s+align="left"\s+hspace="4"\s+alt\s+=(.*)>\s+-\s+Phone\s+#\s+(.*?)\s+<\/font>\s+<BR>\s+<font\s+size\s+="1">~'; preg_match_all($regex, $data, $final); $jlimit = count($final[0]); for($j=0 ;$j < $jlimit; $j++) { $filename = 'cake.sql'; $somecontent = "(".$inc.", '".$final[1][$j]."', '".$final[2][$j]."', '".$final[3][$j]."', '".$final[4][$j]."', '".$final[6][$j]."', '".$final[8][$j]."'),\n"; if (is_writable($filename)) { if (!$handle = fopen($filename, 'a')) { echo "Cannot open file ($filename)"; exit; } if (fwrite($handle, $somecontent) === FALSE) { echo "Cannot write to file ($filename)"; exit; } echo "Success, wrote ($somecontent) to file ($filename)"; $inc = $inc + 1; fclose($handle); } else { echo "The file $filename is not writable"; } } ?> Hi, I have the written the following code which scrapes price info from a website: $url = 'http://www.mydomain.com'; $html = file_get_contents($url); $pattern = '/<span class="price">(.*?)<\/span>/'; preg_match_all($pattern, $html, $matches); print_r($matches); It works well however I need to add in the delivery cost to each array element with a different pattern: /<span class="delivery">(.*?)<\/span>/'; Any idea how i can do this so each array element has both the price and delivery costs in a two dimensional array? Thanks for your advice Not wanting to 'hijack' another thread, but curious about the error of my ways. Why specifically (and very very clearly as I am OLD fart and totally self taught in php/mysql), why will this not suffice to determine form where POSTED data came? page 1 - form start session create and store a hashed session variable 'who_is_it' (ie hash 1Q9zFrEd) display the form submit to page 2 page 2 start session create a hashed variable $is_it-me (ie hash 1Q9zFrEd) compare $is_it_me with the session variable 'who_is_it' if the camparison == each other obtain, validate, cleanse and store data ELSE not from a vaild source I need to pass captured data, basically, a list of email addresses that are being read from a CSV file and (1st) saved to an local database, then sent over to an external source via URL string. I have everything working except - script opens the file, read and validates all emails in the specified column, and saves to local db. ISSUE: after validating then INSERTING the data into local mySQL db, I need to then pass each piece of data to an external source via URL string. Example: www.domain.com?email={$new_email} I was initially think to just add the URL in a header function but, I'm not sure if the HEADER() function the right method to pass all of the data - during the loop - via the URL string. For example, if the CSV file contain 500 emails: Q: how can I continue that loop until the last email read, passing each up to the external source via the URL string? I'm not sure if using the HEADER() function at the end of the script - but within the loop - will interate and send each capture email via that URL string. Can anyone advise a possible solution to this? thanks I'm trying to make a League of Legends (a video game) community website, both as a personal project and for practice. Now the game has a lot of champions, each of whom have 5 unique abilities. Now, I thought about manually inputting all the details about each champion into a MySQL database, but that would long and tedious, and I don't really have the time for it now. Also, the game patches very oftern (like, once every 2 weeks) which changes many of the stats, etc. of the champion, and it is not possible for me to keep manually updating these every time there is a patch. Fortunately, there is a League of Legends Wiki which has all the data I need in their specific champion pages, which they keep updated per patch. So I was wondering if there was any way to get the data from the divs in the wiki, and have it display on my site. What I want to do in my website is that whenever someone types a champion's name (in a post or whatever), I want it to display a hover-over dialog with some of the champions details. And a lot of other features such as that. In plain English I need a way to : > Tell PHP to go to the wiki's source code on a specific page > Find a specific div container > Get X data from there > Pass X data into a function to display the hover-over I think this way, I would not have to maintain a database as I can leech off the wiki's data. I have not coded anything like this before, so I would like a few pointers as to how to achieve this. Any help will be appreciated! i am currently creating a school portal. there is this page which enables user to create their own website for Computer Education lesson. Every user are able to upload their files and view but the problem is, i need a function for the user to be able to edit it in on the page itself. hello My database is in a same server with seperate domain name , then I want to insert from website1 mysql data on website2 mysql data. can anyone help me? Hey guys, First post here, so feel free to flame me if im violating the rules somehow. So, the issue is this: i built an ebay listing creator for a customer. it conssists of a form with several fields being posted to a page that assembles everything into a listing (text, images, radio buttons etc.). now, what i want to do is to easily allow the customer to copy the compiled source code into the clipboard (or a txt file, doesnt really matters) - in order to easily copy it into ebay. I tried it with CURL, but all i get is the source without the posted information. I must be missing something there. Any help would be appreciated, if you need links or codes iv's used, ill provide. Thanks in advance! I need some help getting started in writing the php code that would import a text file of name/value pairs and then create an html table with those values. The datafile looks something like this: [number]-[attribute]=[value]; 1-Host=server1.abc.dev.jkl;2-Date=Wed Aug 12 2010;3-Set=abc.123.cde;4-Time=01:00:03;5-Length=00:36:09;6-Size=41.54 GB;7-Status=Succeeded; 1-Host=server2.abc.dev.qrs;2-Date=Wed Aug 12 2010;3-Set=gls202.kul_lvm;5-Length=06:20:33;7-Status=Succeeded; 1-Host=server9.mra.dev.xyz;2-Date=Wed Aug 11 2010;3-Set=gls101.aie_lvm;4-Time=01:00:02; Let's say I have an html table: Code: [Select] <table id="stats"> tr> <th>Host</th> <th>Date</th> <th>Set</th> <th>Time</th> <th>Length</th> <th>Size</th> <th>Status</th> </tr> <tr> <td>server1.abc.dev.jkl</td> <td>Wed Aug 12 2010</td> <td>abc.123.cde</td> <td>01:00:03</td> <td>00:36:09</td> <td>41.54 GB</td> <td>Succeeded</td> </tr> </table> I've looked around all day at various samples. I've seen the fgetcsv function. I'm not sure what would be the best approach to load this into data into a table. Using a regex, then load into array or hash and then print this out? I'm assuming i would have to create print statements to produce the html tags. Also, with the sample records I provided. There will be instances where not all the attributes (1-7) have values. So, i'm envisioning empty cells for that record which is fine. I can also change the way the datafile is generated and remove the [number], so its just [attribute]=[value] if that makes it easier. Is there a good mapping technique for this? Thank you for your help. Hello to all, I have problem figuring out how to properly display data fetched from MySQL database in a HTML table. In the below example I am using two while loops, where the second one is nested inside first one, that check two different expressions fetching data from tables found in a MySQL database. The second expression compares the two tables IDs and after their match it displays the email of the account holder in each column in the HTML table. The main problem is that the 'email' row is displayed properly while its while expression is not nested and alone(meaning the other data is omitted or commented out), but either nested or neighbored to the first while loop, it is displayed horizontally and the other data ('validity', 'valid_from', 'valid_to') is not displayed.'
Can someone help me on this, I guess the problem lies in the while loop? <thead> <tr> <th data-column-id="id" data-type="numeric">ID</th> <th data-column-id="email">Subscriber's Email</th> <th data-column-id="validity">Validity</th> <th data-column-id="valid_from">Valid From</th> <th data-column-id="valid_to">Valid To</th> </tr> </thead> Here is part of the PHP code:
<?php while($row = $stmt->fetch(PDO::FETCH_ASSOC)) { echo ' <tr> <td>'.$row["id"].'</td> '; while ($row1 = $stmt1->fetch(PDO::FETCH_ASSOC)) { echo ' <td>'.$row1["email"].'</td> '; } if($row["validity"] == 1) { echo '<td>'.$row["validity"].' month</td>'; }else{ echo '<td>'.$row["validity"].' months</td>'; } echo ' <td>'.$row["valid_from"].'</td> <td>'.$row["valid_to"].'</td> </tr>'; } ?>
Thank you. OK, I have the initial cURL working but need to figure out how to extract data I want off that webpage to display or store in a database, I tried using dom and xpath, but because of the way the page displays using css, i think its not picking it up. Here is my cURL script: <?php $userAgent = 'Googlebot/2.1 (http://www.googlebot.com/bot.html)'; $target_url = "www.test.com"; $ch = curl_init(); curl_setopt($ch, CURLOPT_USERAGENT, $userAgent); curl_setopt($ch, CURLOPT_URL,$target_url); curl_setopt($ch, CURLOPT_FAILONERROR, true); curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true); curl_setopt($ch, CURLOPT_AUTOREFERER, true); curl_setopt($ch, CURLOPT_RETURNTRANSFER,true); curl_setopt($ch, CURLOPT_TIMEOUT, 10); $html = curl_exec($ch); if (!$html) { echo "<br />cURL error number:" .curl_errno($ch); echo "<br />cURL error:" . curl_error($ch); exit; } // parse the html into a DOMDocument $dom = new DOMDocument(); $dom->loadHTML($html); // grab all the on the page $xpath = new DOMXPath($dom); $hrefs = $xpath->evaluate("/html/body//td"); for ($i = 0; $i < $hrefs->length; $i++) { $href = $hrefs->item($i); $url = $href->getAttribute('href'); storeLink($url,$target_url); echo "<br />Link stored: $url"; } ?> and here is a snippet of the source of the page I am getting: <span id="lblTest"><h1 id='surrZipTitle'>Agents in Surrounding Zip Codes</h1><table cellpadding='0' cellspacing='0' border='0' class='tblDent'><tr><td class='tdEliteTitle'><span class='caaSubHead3 addwidth'>H.K. Dent Elite</span></td></tr><tr><td class='tdEliteContent'><table cellpadding='0' cellspacing='0' border='0'><tr><td valign='top'><span class='caaAgencyName2 addwidth'>PROFESSIONAL INS ASSOC, INC.</span></td><td valign='top'> </td></tr></table><table cellpadding='0' cellspacing='0' border='0'><tr><td width='360px' valign='top'><div class='addressBlock'><span>4444 MANZANITA AVE STE 6</span><br /><span>CARMICHAEL , CA 95608-1488</span><br /><a class='faaBlueLink' id='lnkContact' href='http://www.safeco.com/portal/server.pt/gateway/PTARGS_0_20656_395_362_0_43/http%3B/por-portlets-prd.int.apps.safeco.com%3B13425/dotcom/FindAnAgent/find-an-agent/contactanagent.aspx?RequestType=agency&level=elite&Id=0415199904150295&lat=38.646142&lng=-121.327623' onclick='oOobj4.Preferences.Plugins.Events.poX=0;'>Contact & Directions</a> <a class='faaBlueLink' id='lnkWebSite' style='display: none;' href='http://' target='_blank' onclick="return trackEvent('/External-Link/AgentWebsite/ ','PROFESSIONAL INS ASSOC, INC. ');">Website</a></div></td><td valign='top'> </td></tr></table></td></tr></table><table cellpadding='0' cellspacing='0' border='0' class='tblDent'><tr><td class='tdEliteTitle'><span class='caaSubHead3 addwidth'>H.K. Dent Elite</span></td></tr><tr><td class='tdEliteContent'><table cellpadding='0' cellspacing='0' border='0'><tr><td valign='top'><span class='caaAgencyName2 addwidth'>AMERICAN AIM AUTO INS AGY, INC</span></td><td valign='top'> </td></tr></table><table cellpadding='0' cellspacing='0' border='0'><tr><td width='360px' valign='top'><div class='addressBlock'><span>5339 SAN JUAN AVE</span><br /><span>FAIR OAKS , CA 95628-3318</span><br /><a class='faaBlueLink' id='lnkContact' href='http://www.safeco.com/portal/server.pt/gateway/PTARGS_0_20656_395_362_0_43/http%3B/por-portlets-prd.int.apps.safeco.com%3B13425/dotcom/FindAnAgent/find-an-agent/contactanagent.aspx?RequestType=agency&level=elite&Id=0415911704151222&lat=38.66237&lng=-121.292429' onclick='oOobj4.Preferences.Plugins.Events.poX=0;'>Contact & Directions</a> So basically I want to extract the agency name like "<span class='caaAgencyName2 addwidth'>PROFESSIONAL INS ASSOC, INC.</span>" and the address which always use the same div class like "caaAgencyName2" and "addressBlock". How can this be accomplished? I have a page (input.php) that will allow a user to upload a CSV file. This file has 5 columns (SKU, Product, Quantity, Retail Price, and Total Retail Price). The CSV upload will only have the SKU number and Quantity filled in. When the user hit upload the (import.php) page is suppose to go to the site and pull the product up by searching the SKU number and pulling the price and product (brand and title). I paid a freelancer to create this code. I watched it work on his machine. I cant seem to get it to work on mine (wont pull price or product) and he is non-responsive now. Any help would be greatly appreciated!! I added some note in the code as I was troubleshooting.
<?php ini_set('max_execution_time', 0); error_reporting(0); move_uploaded_file($_FILES["file"]["tmp_name"], "upload/". $_FILES["file"]["name"]); $handle = fopen("upload/". $_FILES["file"]["name"], "r"); $file = ''; $line .= "SKU,Product,Quantity,Retail Price,Total Retail Price"; $file .= $line . PHP_EOL; for ($i = 0; $row = fgetcsv($handle ); ++$i) { // Do something will $row array if($row[0]!="" AND $i>0) { $line=""; #echo "<pre>"; #print_r($row); $SKU=$row[0]; $quantity=$row[2]; $loop=1; do{ $url = "https://www.homedepot.com/s/".$SKU; $ch = curl_init(); curl_setopt($ch, CURLOPT_URL, $url); curl_setopt($ch, CURLOPT_RETURNTRANSFER, true); curl_setopt($ch, CURLOPT_HEADER, 1); $response = curl_exec($ch); $header_size = curl_getinfo($ch, CURLINFO_HEADER_SIZE); $headers = substr($response, 0, $header_size); $body = substr($response, $header_size); curl_close($ch); header("Content-Type:text/plain; charset=UTF-8"); $headers_arr = explode("\r\n", $headers); $str=$headers_arr[5]; $arr=explode(":",$str); $check=trim($arr[0]); #echo $check; ### remove troubleshooting if($check=="location") # made lowercase so it would get inside the If statement { #echo "Dustin"; ## remove troubleshooting $productPageLink=$headers_arr[5]; $productPageLink=str_replace("Location:","",$productPageLink); #echo $productPageLink; ## troubleshooting -- seems to be getting the links $productPageLink=trim($productPageLink); $productPageLink=str_replace("http:","https:",$productPageLink); #echo $productPageLink; ## troubleshooting -- still seems to have links $ch = curl_init(); #echo $ch; ##troubleshooting -- prints out "resouce id" curl_setopt($ch, CURLOPT_URL, $productPageLink); #echo $ch; ##troubleshooting -- prints out "resouce id" curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1); curl_setopt($ch, CURLOPT_CUSTOMREQUEST, 'GET'); curl_setopt($ch, CURLOPT_ENCODING, 'gzip, deflate'); $headers = array(); $headers[] = 'User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:70.0) Gecko/20100101 Firefox/70.0'; $headers[] = 'Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8'; $headers[] = 'Accept-Language: en-US,en;q=0.5'; $headers[] = 'Upgrade-Insecure-Requests: 1'; $headers[] = 'Connection: keep-alive'; $headers[] = 'Te: Trailers'; #echo $headers; ##troubleshooting -- ## Troubleshooting -- prints out "Array" curl_setopt($ch, CURLOPT_HTTPHEADER, $headers); $result = curl_exec($ch); #echo $result; ### troubleshooting - doesnt have any data if (curl_errno($ch)) { echo 'Error:' . curl_error($ch); # it end inside this if statement, however no error is printed } curl_close($ch); preg_match_all('/<h2 class="product-title__brand" itemprop="brand" data-component="clickable brand link">(.*?)<\/h2>/s', $result, $output_array_brand); #echo "<pre>"; ### #print_r($output_array_brand);#### $brand=trim(strip_tags($output_array_brand[1][0])); preg_match_all('/<h1 class="product-title__title">(.*)<\/h1>/', $result, $output_array); $productTitle=$output_array[1][0]; $productTitle=$brand." ".$productTitle; preg_match_all('/<span class="price__dollars">(.*?)<\/span>/s', $result, $output_array_price); preg_match_all('/<span class="price__cents">(.*)<\/span>/', $result, $output_array_cent); #echo "<pre>"; #print_r($output_array_price); $price=trim(strip_tags($output_array_price[1][0])); $cent=trim(strip_tags($output_array_cent[1][0])); if($cent!="" OR $cent!=0) { $price=$price.".".$cent; } $line.=$row[0].","; $line.='"'.$productTitle.'",'; $line.=$row[2].","; $line.=$price.","; $totalPrice=$row[2]*$price; $line.=$totalPrice; $file .= $line . PHP_EOL; } # echo "<br>"; $loop=$loop+1; #echo "<br>"; if($loop>4) { if($check!="Location") { $line.=$row[0].","; $line.=','; $line.=$row[2].","; $line.=","; $line.=""; $file .= $line . PHP_EOL; break; } } } while($check!="Location"); } } fclose($handle); header('Content-Type: application/csv'); $output=$_REQUEST['output']; header('Content-disposition: attachment; filename='.$output.'.csv'); echo $file; #header('Content-disposition: attachment; filename='.$output.'1.csv'); #echo $file1; exit; ?>
Hello! I would like to grab data from a website. What I want is to take the name and time and output it like following:
Ayarith 59 seconds Indirarc 54 seconds
The data is taken from an online game with the following link: https://medivia.online/community/online/legacy
Exemple data from this link: <li><div class="med-width-25">59 seconds ago</div><div class="med-width-35"><a href="/community/character/Ayarith">Ayarith</a></div><div class="med-width-15">Druid</div><div class="med-width-25 med-text-right med-pr-40">32</div></li> <li><div class="med-width-25">54 seconds ago</div><div class="med-width-35"><a href="/community/character/Indirarc">Indirarc</a></div><div class="med-width-15">Druid</div><div class="med-width-25 med-text-right med-pr-40">20</div></li>
I hope you can help me out! Hi, I am quite new to PHP and to learn more about it I decided I would make a little project to easy my life a little bit I want to make a site that fetches lunch menus from the restaurants nearby my work. I would just like some help getting pointed in the right direction on how to make this happen Thanks! Br, Niklas There is a school locator script at https://www.ocps.net/parents/pages/FindaSchool.aspx When we input any address it returns schools located in that locality. For example: Use these details and submit the form Street Number : 3902 Street Name : Bobolink Street Type : Lane City : Orlando It returns a row having three schools Elementary,Middle and High school On clicking the more button it takes to the respective school details So for each school I need the respective school names Audobon Elementary Glenridge Middle Winter Park High Please suggest some ways to achieve this functionality Thanks _________________________________________________ ____________________ http://travelinfo.techserveglobal.com/ http://blog.techserveglobal.com/ Hello guys, I have intermediate level PHP and I am currently working on a website. I need to write a function that will get me some specific information from a website. I need this function to go write the users name on the search bar over the http://competitive.eune.leagueoflegends.com/ladders/eune/current/rankedsolo5x5 after it founds which page the user is on I want the information of that user registered on my database. But I am not sure where to start.. An example would be like http://competitive.eune.leagueoflegends.com/ladders/eune/current/rankedsolo5x5?summoner_name=&page=4 page 5: rank 121 Sokoren 42 22 1877 I already know the name, I just need the other numerical information and get them into my database. I only want this function to work when a certain nick name is entered. Any help would be appreciated. I am wondering if it's possible to get some data from another website via PHP? I would like to get data from website http://www.gamersfirst.com/warrock/?q=Player&nickname=soldier, and the exact data which I need is "Level" which is in this case 2. Can this value be grabbed and if it can, where can I get some info about how to do it? I would like to display weather conditions on my website and store them in mysql database. I wonder if it is possible to load web page into php for parsing so the required info could be found and used? Is it doable with php and what functions could you recommend for this functionality? How can I have program running in the background on the server which would be triggered every so often to perform this task? Another question is if I want to trigger some action once a week like Tuesday at noon, is there function in php which could be used to check what day of the week and time it is ? I am new to website design so any help will be greatly appreciated. It's very possible, and I've seen a website that's done it (http://www.zybez.net/radio/) - They are able to include the recent songs played on there website. The information can be obtained from right he http://68.168.100.60:7942/played.html But I'm not sure if thats where they get it from. If I login to my Admin section, there will be an option that says get XML stats. But I'm not sure if that shows the recent songs played like it would at http://68.168.100.60:7942/played.html (Everytime I refresh the page, the stats/data/info in the XML change) Code: [Select] 1250001522Rockhttp://srbuckey.listen2myradio.comMusicPEAK - RockKings of Leon - Radioactive N/AN/AN/A226153140audio/mpeg1.9.86201901701119000000004013207.144.125.60NSPlayer/11.0.6001.7006 WMFSDK/11.0026101521287587284Kings of Leon - Radioactive 1287586919Three Days Grace - Riot 1287586542Alice In Chains - Here Comes The Rooster 1287586281Green Day - Boulevard of Broken Dreams 1287586277down 1287586075Green Day - American Idiot 1287586044Metallica - Enter Sandman 1287585836ACDC Highway To Hell.mp3 1287585449Metallica - 04 The Unforgiven I.MP3 1287585448[HTTP/1.1 200 OK] http://www.ventsi.com/Music/MetallicA/1991 - Black Album/Metallica - 04 The Unforgiven I.MP3 |