Learn VBA & Macros in 1 Week!

PHP - Php Scraping Multiple Tags Simplehtmldom

Full Excel VBA Course - Beginner to Expert

Php Scraping Multiple Tags Simplehtmldom	View Content

The html file i'm scraping from has 11 boxevent div classes and I want to loop through them and grab the results and team name for each. Am I doing it the right way? Also how can I loop through all float-right result and team classes grabbing the data.

<div class="boxevent">
<div class="row">
<span class="float-right result">4</span>
<span class="team" title="Pittsburgh Power ">PIT</span>
</div>
<div class="row">
<span class="float-right result">12</span>
<span class="team" title="Orlando Predators ">ORL</span>
</div>
</div>

<div class="boxevent">
<div class="row">
<span class="float-right result">24</span>
<span class="team" title="Atlanta ">ATL</span>
</div>
<div class="row">
<span class="float-right result">6</span>
<span class="team" title="Miami ">MIA</span>
</div>
</div>

**php**

    <?php
    include 'includes/simple_html_dom.php';
    $html = new simple_html_dom();
    $html = file_get_html('http://score...', false, $context);

    $score0 = $html->find('span[class=float-right result]', 0);
    $score1 = $html->find('span[class=float-right result]', 0);

    $team0 = $html->find('span[class=team]', 0);
    $team1 = $html->find('span[class=team]', 0);

    /* out */

    echo '<pre>';

    print_r($score0);
    print_r($score1);
    print_r($team0);
    print_r($team1);

    echo '</pre>';

    $html->clear();
    unset($html);
    ?>

Full Excel VBA Course - Beginner to Expert

Server Timeout Scraping Multiple Pages

Similar Tutorials

View Content

Hi,

I have some code which scrapes data from a page. However there are around 1200 product pages on the site I need to scrape, when I attempt to loop through all the pages I get a server timeout. I can only get to around 40 without timeout. Has anyone else had this problem?

Problem Loading A File Into Simplehtmldom

Similar Tutorials

View Content

I am a new simplehtmldom user. It is working and the php api loads, and I can parse
when I lod data from a string. But if I attemp yo load data from a file like this...
Code: [Select]
$html->file_get_html("C:\eaahmpg\eatc7402\www\stations\Sydney\code\table_data.txt");

I receive the following error.
Fatal error: Call to undefined method simple_html_dom::file_get_html() in...

Hmmm. The function IS defined in the api.

I'm not sure where I'M going wrong here.

eac7402

Portiing Over A Parser From Bs4 To Simplehtmldom-parser

Similar Tutorials

View Content

hello dear Freaks

i am currently musing bout the portover of a python bs4 parser to php - working with the simplehtmldom-parser / pr the DOM-selectors... (see below).

The project: for a list of meta-data of wordpress-plugins: - approx 50 plugins are of interest! but the challenge is: i want to fetch meta-data of all the existing plugins. What i subsequently want to filter out after the fetch is - those plugins that have the newest timestamp - that are updated (most) recently. It is all aobut acutality...

https://wordpress.org/plugins/participants-database ....and so on and so forth.

https://wordpress.org/plugins/wp-job-manager
https://wordpress.org/plugins/ninja-forms
https://wordpress.org/plugins/participants-database ....and so on and so forth.

we have the following set of meta-data for each wordpress-plugin:

Version: 1.9.5.12 
installations: 10,000+    
WordPress Version: 5.0 or higher 
Tested up to: 5.4 PHP  
Version: 5.6 or higher    
Tags 3 Tags:databasemembersign-up formvolunteer
Last updated: 19 hours ago

the project consits of two parts: the looping-part: (which seems to be pretty straightforward). the parser-part: where i have some issues - see below. I'm trying to loop through an array of URLs and scrape the data below from a list of wordpress-plugins. See my loop below-

as a base i think it is good starting point to work from the following target-url:

plugins wordpress.org/plugins/browse/popular with 99 pages of content: cf ...
wordpress.org/plugins/browse/popular/page/1
wordpress.org/plugins/browse/popular/page/2
wordpress.org/plugins/browse/popular/page/99

the Output of text_nodes:

['Version: 1.9.5.12', 'Active installations: 10,000+', 'Tested up to: 5.6 ']

but if we want to fetch the data of all the wordpress-plugins and subesquently sort them to show the -let us say - latest 50 updated plugins. This would be a interesting task:

first of all we need to fetch the urls

then we fetch the information and have to sort out the newest- the newest timestamp. Ie the plugin that updated most recently

List the 50 newest items - that are the 50 plugins that are updated recently ..

we have the following set

see here the Soup_

 soup = BeautifulSoup(r.content, 'html.parser')
        target = [item.get_text(strip=True, separator=" ") for item in soup.find(
            "h3", class_="screen-reader-text").find_next("ul").findAll("li")[:8]]
        head = [soup.find("h1", class_="plugin-title").text]
        new = [x for x in target if x.startswith(
            ("V", "Las", "Ac", "W", "T", "P"))]
        return head + new


with ThreadPoolExecutor(max_workers=50) as executor1:
    futures1 = [executor1.submit(parser, url) for url in allin]

for future in futures1:
    print(future.result())

see the formal output

Quote

[lorem ipsum dolor sit amet', 'Version: 2.34.1', 'Last updated: 5 months ago', 'Tags: magna aliquyam erat, sed diam voluptua. At vero eos et accusam']
[consetetur sadipscing elitr', 'Version: 6.54.1', 'Last updated: 5 months ago', 'Tags: lorem ipsum dolor sit amet']
[sed diam nonumy eirmod tempor invidunt ut labore', 'Version: 7.16.1', 'Last updated: 5 months ago', 'Tags: tarifa, sevilla lisabin invidunt ut labore et dolore magna aliquyam erat']
[tempor invidunt ut taria malaga jerusalem labore', 'Version: 9.58.1', 'Last updated: 5 months ago', 'Tags: ilabore et lissabon dolore magna aliquyam erat']

background: https://stackoverflow.com/questions/61106309/fetching-multiple-urls-with-beautifulsoup-gathering-meta-data-in-wp-plugins

Well - i guess that we c an do this with the simple DOM Parser - here the seclector reference.

https://stackoverflow.com/questions/1390568/how-can-i-match-on-an-attribute-that-contains-a-certain-string

look forward to any hint and help.

have a great day

Edited May 3, 2020 by dil_bert

Web Scraping

Similar Tutorials

View Content

Hello, Currently my webscraper signs into the site and pulls all the html -> perfect. What I need to do is to loop only specific information (horses that ran) here is my current php code

<?
$url = 'site';
$postdata = array('username' => "username",
		          'password' => "password");

$ch = curl_init();
if($ch){
   curl_setopt($ch, CURLOPT_URL, $url);
   curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 15);
   curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
   curl_setopt($ch, CURLOPT_POST, 1);
   curl_setopt($ch, CURLOPT_POSTFIELDS, $postdata);
   curl_setopt($ch, CURLOPT_COOKIEFILE, 'cookies.txt'); // set cookie file to given file
   curl_setopt($ch, CURLOPT_COOKIEJAR, 'cookies.txt'); // set same file as cookie jar

   $content = curl_exec($ch);
   $headers = curl_getinfo($ch);				

   curl_close($ch);

   // Debug option
   // print_r($headers);

   if($headers['http_code'] == 200){ 
      echo $content;
   }
}
?>

here is the html im pulling

<table width=100% border=1><tr><td class=instruction6 colspan=4><b>My Race Notes</b></td></tr>
<tr><td width=90%><form action='races.php?id=7456132' method=post>
<textarea name='comments' rows=2 cols=38>Type notes & press Add</textarea></td>
<td width=5%><input type=submit class='weestatbutton' value='Add'></form></td></tr></table></td></tr></table><table width=100%><tr class=databreakdown2253><th><a href='races.php?id=7456132&sortby=1'>Place</a></th><th>Dist Bt</th><th>Stall</th>
<th>Horse</th><th>Age</th><th><a href='races.php?id=7456132&sortby=3'>Weight</a></th><th>Headgear</th><th>OR</th><th>Trainer</th>
<th><a href='races.php?id=7456132&sortby=2'>Odds</a></th><th>Jockey (Claim)</th></tr><tr><td class=databreakdown2253>1st</td><td class=databreakdown2253></td><td class=databreakdown2253>4</td>
<td class=databreakdown2253><a href='horses.php?id=298745'>Telegraph (IRE)</a></td>
<td class=databreakdown2253>3</td><td class=databreakdown2253>9-3</td><td class=databreakdown2253></td>
<td class=databreakdown2253>57</td>
<td class=databreakdown2253><a href='trainers.php?id=2448'>Evans, P D</a></td>
<td class=databreakdown2253>28/1  </td>
<td class=databreakdown2253><a href='jockeys.php?id=694'>Egan, John</a> </td></tr><tr class=databreakdown18><td colspan=12>soon led, brought field stands side from 3f out, headed 2f out, rallied inside final furlong, bumped and led again towards finish</td></tr><tr><td class=databreakdown2253>2nd</td><td class=databreakdown2253>0.5</td><td class=databreakdown2253>3</td>
<td class=databreakdown2253><a href='horses.php?id=305855'>Ecliptic Sunrise</a></td>
<td class=databreakdown2253>3</td><td class=databreakdown2253>8-12td><td class=databreakdown2253></td>
<td class=databreakdown2253>52</td>
<td class=databreakdown2253><a href='trainers.php?id=4516'>Donovan, D</a></td>
<td class=databreakdown2253>10/1  </td>
<td class=databreakdown2253><a href='jockeys.php?id=3414'>Cosgrave, Pat</a> </td></tr><tr class=databreakdown18><td colspan=12>chased leaders, challenged 2f out, led 2f out, edged right inside final furlong, rider lost whip and headed towards finish</td></tr><tr><td class=databreakdown2253>3rd</td><td class=databreakdown2253>1.5</td><td class=databreakdown2253>1</td>
<td class=databreakdown2253><a href='horses.php?id=300316'>Bookmaker</a></td>
<td class=databreakdown2253>4</td><td class=databreakdown2253>9-6</td><td class=databreakdown2253><a title='Blinkers worn'>Blnk</a></td>
<td class=databreakdown2253>59</td>
<td class=databreakdown2253><a href='trainers.php?id=933'>Bridger, J J</a></td>
<td class=databreakdown2253>6/1  </td>
<td class=databreakdown2253><a href='jockeys.php?id=3848'>Carson, William</a> </td></tr><tr class=databreakdown18><td colspan=12>prominent, took keen hold, led 2f out, headed over 1f out, not much room inside final furlong, stayed on same pace</td></tr><tr><td class=databreakdown2253>4th</td><td class=databreakdown2253>1</td><td class=databreakdown2253>2</td>
<td class=databreakdown2253><a href='horses.php?id=261986'>Night Trade (IRE)</a></td>
<td class=databreakdown2253>7</td><td class=databreakdown2253>8-8</td><td class=databreakdown2253><a title='Cheekpieces worn'>CkPc</a></td>
<td class=databreakdown2253>50</td>
<td class=databreakdown2253><a href='trainers.php?id=2653'>Harris, R A</a></td>
<td class=databreakdown2253>6/1  </td>
<td class=databreakdown2253><a href='jockeys.php?id=7348'>Hardie, Cameron</a> (3)</td></tr><tr class=databreakdown18><td colspan=12>prominent, ridden over 2f out, switched left inside final furlong, no extra close home</td></tr><tr><td class=databreakdown2253>5th</td><td class=databreakdown2253>1.5</td><td class=databreakdown2253>6</td>
<td class=databreakdown2253><a href='horses.php?id=299296'>Trigger Park (IRE)</a></td>
<td class=databreakdown2253>3</td><td class=databreakdown2253>8-10</td><td class=databreakdown2253></td>
<td class=databreakdown2253>50</td>
<td class=databreakdown2253><a href='trainers.php?id=2653'>Harris, R A</a></td>
<td class=databreakdown2253>20/1  </td>
<td class=databreakdown2253><a href='jockeys.php?id=3422'>Dobbs, Pat</a> </td></tr><tr class=databreakdown18><td colspan=12>chased leaders, ridden over 2f out, one pace over 1f out, no impression</td></tr><tr><td class=databreakdown2253>6th</td><td class=databreakdown2253>2.25</td><td class=databreakdown2253>7</td>
<td class=databreakdown2253><a href='horses.php?id=300337'>Port Lairge</a></td>
<td class=databreakdown2253>4</td><td class=databreakdown2253>8-11</td><td class=databreakdown2253><a title='Blinkers worn'>Blnk</a></td>
<td class=databreakdown2253>50</td>
<td class=databreakdown2253><a href='trainers.php?id=914'>Gallagher, J</a></td>
<td class=databreakdown2253>33/1  </td>
<td class=databreakdown2253><a href='jockeys.php?id=193'>Catlin, Chris</a> </td></tr><tr class=databreakdown18><td colspan=12>slowly into stride, in rear, stayed on inside final furlong, never dangerous</td></tr><tr><td class=databreakdown2253>7th</td><td class=databreakdown2253>NK</td><td class=databreakdown2253>11</td>
<td class=databreakdown2253><a href='horses.php?id=289934'>Lionheart</a></td>
<td class=databreakdown2253>4</td><td class=databreakdown2253>8-13</td><td class=databreakdown2253></td>
<td class=databreakdown2253>59</td>
<td class=databreakdown2253><a href='trainers.php?id=4910'>Crate, Peter</a></td>
<td class=databreakdown2253>10/1  </td>
<td class=databreakdown2253><a href='jockeys.php?id=7375'>Crouch, Hector</a> (7)</td></tr><tr class=databreakdown18><td colspan=12>reared start and slowly away, held up in rear, headway over 1f out, weakened inside final furlong</td></tr><tr><td class=databreakdown2253>8th</td><td class=databreakdown2253>2.75</td><td class=databreakdown2253>14</td>
<td class=databreakdown2253><a href='horses.php?id=289421'>Koharu</a></td>
<td class=databreakdown2253>4</td><td class=databreakdown2253>9-4</td><td class=databreakdown2253><a title='Cheekpieces worn'>CkPc</a></td>
<td class=databreakdown2253>60</td>
<td class=databreakdown2253><a href='trainers.php?id=2495'>Makin, P J</a></td>
<td class=databreakdown2253>9/4 (Fav) </td>
<td class=databreakdown2253><a href='jockeys.php?id=5952'>Bates, Mr D J</a> (3)</td></tr><tr class=databreakdown18><td colspan=12>in rear, ridden over 3f out, no impression</td></tr><tr><td class=databreakdown2253>9th</td><td class=databreakdown2253>3</td><td class=databreakdown2253>5</td>
<td class=databreakdown2253><a href='horses.php?id=269827'>Saskias Dream</a></td>
<td class=databreakdown2253>6</td><td class=databreakdown2253>9-6</td><td class=databreakdown2253><a title='Visor worn'>Vsor</a></td>
<td class=databreakdown2253>59</td>
<td class=databreakdown2253><a href='trainers.php?id=2002'>Chapple-Hyam, Jane</a></td>
<td class=databreakdown2253>4/1  </td>
<td class=databreakdown2253><a href='jockeys.php?id=3544'>Hughes, Richard</a> </td></tr><tr class=databreakdown18><td colspan=12>mid-division, headway and switched left over 1f out, edged left entering final furlong, soon eased</td></tr><tr><td class=databreakdown2253>10th</td><td class=databreakdown2253>1.75</td><td class=databreakdown2253>12</td>
<td class=databreakdown2253><a href='horses.php?id=304248'>Crafty Business (IRE)</a></td>
<td class=databreakdown2253>3</td><td class=databreakdown2253>9-2</td><td class=databreakdown2253><a title='Visor worn'>Vsor</a></td>
<td class=databreakdown2253>59</td>
<td class=databreakdown2253><a href='trainers.php?id=695'>Moore, G L</a></td>
<td class=databreakdown2253>14/1  </td>
<td class=databreakdown2253><a href='jockeys.php?id=6669'>Bishop, Mr C</a> (3)</td></tr><tr class=databreakdown18><td colspan=12>towards rear, pushed along over 3f out, well beaten 2f out</td></tr></table><br><hr></td></tr></table>

*note I'm using this for personal reasons

Web Scraping Question

Similar Tutorials

View Content

I am a newbie and I am trying to do a site scraping project to obtain all the following fields:

Test Year, Test Name, Grade Level, Question #, Question Type, Reporting Category, Standard #, Standard description, Example Question (with image)

for this web page that has a page for each question.

http://www.doe.mass.edu/mcas/search/question.aspx?mcasyear=2010&QuestionSetID=1&grade=8&subjectcode=MTH&questionnumber=36

I am a newbie at PHP and would love if you could point me in the right direction. The page uses tables and I need to extract the data from the body of the page as well as some of the info from the url and then have it inserted into a MySQL database.

Thank you so much for your help.

Screen Scraping With Php?

Similar Tutorials

View Content

Hi,

Im trying to work out a way to get the New York Lottery's Take 5 results. Theres a few sites that list the winning numbers, i assume automatically as there is alot of lottery games on these sites. what would be the best way to get this?

http://www.myfreepost.com/lottery/index.php/us/newyorklottery/takefive/result/
http://www.elite-lottery-results.com/?action=view_game&gid=NY2

Scraping Data

Similar Tutorials

View Content

More information on the job posting.

I am looking to fetch information from daily deal website,
Such as tuango.ca, socialliving.com, groupon.com...ect

I want to retrieve data from different daily deal sites, and I want to retrieve all the deals of the day from each different city in the website.

For example www.tuango.ca
Has a deal a day in Montreal, Toronto,...ect

I want to be apply to retrieve data from all the different location within the site.

I want the script to fetch the data of deals. To be more clear I want the script to fetch

What site the deal was on
What location was it for
What's the tittle of the deal
What price is the deal
What's the value of the deal
What's the saving in percentage of the deal
How much were sold
What's the minimum amount of the deal before it becomes activated
What's the company who did the deal
Company address
Company postal code
Company phone number
(there might be more categories..will talk more if you pass this stage of the interview process)

Ones all this data is fetched I need it to automatically be store in a database.

Every morning at 4:am (eastern time)
I need it to run the script, because the days deals finish at midnight and it's the only way of getting a number of the total number of coupons sold. you'll usually see the final stats of the deal on their recent deals page of the website.

I want to know how a site like http://onespout.com/deals/montreal did it..
I'm not asking somebody to do it for me I'm just asking someone to guide me in takeing the right steps

Link Scraping

Similar Tutorials

View Content

i need some help to scrape a link from specified page.
for example if i have a page like this http://br.4ce.info/

i want to scrape all link on that page
and i want to show all link in that page on my wordpress widget in another blog ?

can you help me with this ?
dont use iframe
i think better using cURL
thanks

Scraping Websites

Similar Tutorials

View Content

Okay so I am scraping websites for their descriptions keywords and titles.
I noticed that a lot of websites use the same keywords and descriptions on every page..
so my idea is to scrape the index and find all the links in there and scrape them all then after they been scraped check all of the descriptions and if the descriptions match then pull some text unique to each page and use that.
I can't seem to wrap my head around it.. how would I accomplish this?
I scrape with curl then find keywords description and title then find all links on the site and scrape those.

soo I was thinking making an array of the descriptions and then checking and inserting to the db but doesn't seem like it would work.
Any ideas?

Oh also.. how would I grab just text from each page that is different from every other page?
lol very confusing

Screen Scraping

Similar Tutorials

View Content

Ok,

I know how to screen scrape, but I don't know how to screen scrape when there is a login. I've looked this up for awhile, but no luck.
I'd like to also make it so I can execute a url when I am logged in on the script for the script, for an example execute this url: http://site.com/data.php?id=9912&submit=1

Thanks in advanced.

Scraping: Need Some Pointers

Similar Tutorials

View Content

I need to scrape pages - I only need one page at a time
I'm only looking for 2/3 bits of data within each page

Can someone give me some pointers where to start?
I've searched and see names like DOMXpath and Xpath mentioned - do I need these?

It's important that I can run the script on a standard Linux hosting with nothing extra installed like packages - I'd like to have something I can just use immediately using standard php and functions

I've seen plenty of tutorials + youtube videos - just looking for recommendations and pointers for recommended practices

Thanks

OM

Screen Scraping

Similar Tutorials

View Content

Page Scraping Fails

Similar Tutorials

View Content

I'm trying to pull the stock quotes Beta from yahoo finance since the yahoo query language doesn't support it.

My code returns an empty array. Any ideas why?

Code: [Select]
<?php

$content = file_get_contents('http://finance.yahoo.com/q?s=NFLX');
preg_match('#<tr><th width="48%" scope="row">Beta:</th><td class="yfnc_tabledata1">(.*)</td></tr>#', $content, $match);

print_array($match);

?>

Begginer Needing Help With Scraping

Similar Tutorials

View Content

Hello,
I have checked out many of the scripts and tried implementing them to help me scrape 1 single image from a url. Example www.123.com/333.png

Getting a script to scrape that image isnt the problem. Im not sure on how to implement the simple curl to save the image every 30mins and name it in successive order so it appears as , 1.jpg, 2.jpg, 3.jpg

I am working with a debian 6 server and php would be the easiest way to do this that i can work with. I have searched the web endlessly and still cant produce such thing.
Any help is appreciated.

Scraping An Aspx Site With Php

Similar Tutorials

View Content

I'm looking to scrape the schedule details for any particular class at my university as part of a school project. I have been able to log a student into the university site, grab their name and course information.

In order to grab the schedule for a particular class I now have to visit a different area of the university site, the registrar. The course schedule section of the registrar is coded in ASP .net and I'm having trouble making HTTP requests to this area of the site.

I understand the need to make post requests to mimic the Viewstate but I'm running into an issue before I even get to that part.

I am able to load the page via an HTTP request almost every time. But it always takes almost exactly 2 minutes. I have tried simple get requests, post requests with the Viewstate, and other variations to one of a few different pages on the site. Each time it works. But each time it takes 2 minutes.

Any ideas why it takes so long? Any suggestions on what I can possibly do differently?

Here is the basic site I'm using to test my code on before implementing it fully into my program:
University Site
Here is my link that takes 2 minutes to load the same page:
My Site

Here is my latest code I've tried:
Code: [Select]
<?php

$postdata = "__VIEWSTATE=/wEPDwULLTIwNjY2MzUzMDEPZBYCAgUPDxYCHgRUZXh0BRNNYXIgMjMgMjAxMSAgNzoxNVBNZGQYAQUeX19Db250cm9sc1JlcXVpcmVQb3N0QmFja0tleV9fFgEFDmN0bDEwJGltZ0xvZ2luaYy4H4gz+Bjb4GVdsO1ecd9c9EA=";
$postdata .= "&__EVENTVALIDATION=/wEWAgKs/IaWBAKpyP2zAXWcNEO0tMqDX53r6m+Hzo/nKHwZ";
$postdata = urlencode($postdata);
$host = 'courseschedules.njit.edu';
$path = '/index.aspx';

$fp1 = fsockopen($host,80,$errno,$errstr,30);
if(!$fp1)
die($_err.$errstr.$errno);
else
{
fputs($fp1, "POST $path HTTP/1.1\r\n");
fputs($fp1, "Host: $host\r\n");
fputs($fp1, "User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.2.15) Gecko/20110303 Firefox/3.6.15 ( .NET CLR 3.5.30729)\r\n");
fputs($fp1, "Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8\r\n");
fputs($fp1, "Accept-Language: en-us,en;q=0.5\r\n");
fputs($fp1, "Accept-Encoding: gzip,deflate\r\n");
fputs($fp1, "Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7\r\n");
fputs($fp1, "Keep-Alive: 115\r\n");
fputs($fp1, "Connection: keep-alive\r\n");
fputs($fp1, "Content-length: ".strlen($postdata)."\r\n\r\n");
fputs($fp1, $postdata."\r\n\r\n");

$response = '';
while(!feof($fp1)) $response .= fgets($fp1,2000);
fclose($fp1);
echo $response;
}

?>
Like I said, I've also tried a standard get request which works as well, just takes 2 minutes.

Data Scraping With Preg_match_all()

Similar Tutorials

View Content

Hi,

I have the written the following code which scrapes price info from a website:

$url = 'http://www.mydomain.com';
$html = file_get_contents($url);
$pattern = '/<span class="price">(.*?)<\/span>/';
preg_match_all($pattern, $html, $matches);
print_r($matches);

It works well however I need to add in the delivery cost to each array element with a different pattern:
/<span class="delivery">(.*?)<\/span>/';

Any idea how i can do this so each array element has both the price and delivery costs in a two dimensional array?

Thanks for your advice

Screen Scraping Twitter

Similar Tutorials

View Content

I have a form which lets the user put in the URL to their twitter account. When the enter their URL I am trying to create a screen scraping script that scrapes that page to get basic information like their twitter name and number of tweets.

I'm not sure how I am going to do this, I don't think there is a twitter API for this so I may have to use something like cURL. I was just wondering if anyone has done this and could give me any advice about the best method?

Thanks for any help

App Store Scraping Basics

Similar Tutorials

View Content

Could anyone point me in the right direction for downloading app store statistics from:

-the Apple App Store -The Android Market -the Amazon AppStore

Specifically, I'd like to get the

-average selling price of apps -the top selling apps
-the distribution of tablets vs. phones, etc. (e.g. how many apps are there for Honeycomb? How many for iPad?)
-total number of apps in store -free vs. paid apps

I've seen some sites like http://148apps.biz/app-store-metrics/

and http://www.appbrain.com/stats/

How do these sites get their data? There must be a way to export the whole app store database as a CSV file, or import it to MySQL and run queries.

Thanks much for any direction.

How To Add ' While Fwrite In .sql - Data Scraping

Similar Tutorials

View Content

I am writing a sql dump file and some of my fields have ' in it. Like the name is "Joe's Cake Shop". How should i add ' infront of ' to make it look like Joe''s Cake Shop.Also, I got an idea about adding ' infront of ' by seeing other database dump.Can someone please enlighten me why should i do it.

My Code :-

Code: [Select]
<?php
//$final - is the array i am storing my scraped data
//$final[1] - name

$inc = 1;
$data = file_get_contents('http://xxx.com');
$regex = '~<td\s+colspan="2"\s+width="350"><font\s+size="2">\s+<b>\s+(.*?)  <\/b><br>(.*?) <br>(.*?),\s+(.*?)\s+<br>(.*?), (.*?)\s+<BR><BR><font\s+size="2"><img\s+src="\.\.\/images\/phone1.gif"\s+align="left"\s+hspace="4"\s+alt\s+=(.*)>\s+-\s+Phone\s+#\s+(.*?)\s+<\/font>\s+<BR>\s+<font\s+size\s+="1">~';
preg_match_all($regex, $data, $final);
$jlimit = count($final[0]);
for($j=0 ;$j < $jlimit; $j++)
{
$filename = 'cake.sql';
$somecontent = "(".$inc.", '".$final[1][$j]."', '".$final[2][$j]."', '".$final[3][$j]."', '".$final[4][$j]."', '".$final[6][$j]."', '".$final[8][$j]."'),\n";
if (is_writable($filename)) {

    if (!$handle = fopen($filename, 'a')) {
         echo "Cannot open file ($filename)";
         exit;
    }

    if (fwrite($handle, $somecontent) === FALSE) {
        echo "Cannot write to file ($filename)";
        exit;
    }

    echo "Success, wrote ($somecontent) to file ($filename)";
     $inc = $inc + 1;
    fclose($handle);

} else {
    echo "The file $filename is not writable";
}
}

?>

Strip Only Certain Tags, Remove Body Tags, But Keep Its Content.

Similar Tutorials

View Content

hi everyone,

did not know what to make the subject, but here is what I want to do:

I have a string, which gets returned to me from a linux app on my server, it looks something like this:
Code: [Select]
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<HTML>
<HEAD>
<META HTTP-EQUIV="CONTENT-TYPE" CONTENT="text/html; charset=utf-8">
<TITLE> </TITLE>
<META NAME="GENERATOR" CONTENT="OpenOffice.org 3.2 (Linux)">
<META NAME="AUTHOR" CONTENT="Administrator">
<META NAME="CREATED" CONTENT="20110106;14170000">
<META NAME="CHANGEDBY" CONTENT="HOD">
<META NAME="CHANGED" CONTENT="20110522;16540000">
<STYLE TYPE="text/css">

</STYLE>
</HEAD>
<BODY LANG="en-US" TEXT="#000000" LINK="#0000ff" DIR="LTR" STYLE="border: 5.05pt double #000000; padding: 0.67in 0.92in">

<P>I want this and the tags around it, just not the html, head, body and their closing tags.</P>

</BODY>
</HTML>

within the body, the tags are each styled for example: <p style="color: red"></p> so I cannot just get rid of all html, I want to get only all the content within the body tags, but without the body tags obviously
strip_tags does not work as i need, I only want to strip certain tags.

If someone can help me with this i will much appreciate it.