PHP - Web Spider/crawler Help Please!
Hi all ,Its my first post here and I'm still very new to PHP . Im trying to wright a Web crawler script except i want this script to just crawl the 1 target website I enter. Basically i want my script to go to ultimateguitar.com or 911tabs.com or any other guitar tabs website and crawl the site and index any guitar tabs they have in there database. This will provid my website with a "phonebook" of guitar tabs. Its not illeagle or in breach of any copyrights im only making a database of links. Any help would be greatly appreciated!
Similar Tutorialsi am making a search engine and i need help because i need it so you submit your url then it automatically scrolls it for the description and keywords then puts it in the database. my html code: Code: [Select] <form action="submit_url.php" method="get"> <input type="text" name="url" value="url" /> <input type="submit" name="submit" value="submit" /> </form> php code so far: <html> <head> <title>submitting url: <?php $url = $_GET['url']; echo $url; ?></title> <link href="style.css" rel="stylesheet" type="text/css" /> </head> <body> <?php echo "Submitting <b>$url</b>"; $url = fopen("http://knexideas.co.cc", "r") or exit("Unable to open"); while (!feof($url)) { fgetc($url); } fclose($url); ?> </body> </html> all it does is reads the website if you put 'echo $url' at the bottom it just reads and prints the web page. All I want to know first, is this possible... I want to go <a href="http://mirage.smc.edu/pls/pub/f?p=207:1:4398391836116838:HIDE:NO::P1_CLASSTYPE,P1_STATUS,P1_SUBJECTS,P1_LOCATION,P1_INSTRUCTOR,P1_MEETDAYS,P1_BEGINWK,P1_STARTIME:%2C%2C%2C%2C%2C%2C%2C">Here</a>. I want to be able to use Curl or something to go to this page, and submit the data. All I want to control is what Subject is selected, and what Semester radio button is picked. Is this possible with Curl? I have tried to do this. The page linked above, loads some other page via ajax and updates this page dynamically. I tried to posting to that page instead, and it throws an application error. Is it possible to do this and actually get back the table listings. All I want is to be able to specify the subject and semester, and get that table back. If it is possible, how can you get it back with just a standard curl call. I tried doing a straight post to the page it calls, and to that page and neither way is working. Any advice is appreciated. Hi , i've spend some time looking how its possible to spider a phpbb forum with a php script. I'd like to -for example- do a search with the CURL functions , and read out some of the links in the searchresults(topics..). Finally save the links that i want into a mysql database. Somebody got an idea? hey guys im looking for some input from the comunity since this a complicated coding issue i will ask the pros!!! [PS that called sucking up ] Ok so here is my end goal! User arrive to a certain webpage He inputs his Web URL and hits submit. Here is where i get stuck as for my solution!. I would like to runs a script once the url is submitted that will scan the URL and collect and store information in a database, basicly i would like three things. i would like to save the sitename, url, description. I really hope this isnt as complicated as it seems. Thanks Basically I'm a member of an "online dating site". The website tracks who views your profile and displays recent viewers prominently on the home page. One of the ways to get more people to pay attention to your profile (for dates *wink*) is to view peoples profiles and hope they view yours in return and initiate a conversation. Now to my question. I've been able to log into the site successfully with cURL and grab the page contents and implode on unique nodes with php. How can I crawl through users profiles with this. Like I'm a tad stumped on how link crawling works. Any guides that have helped you out in the past would be appreciated. I've used google tons on this subject, and have come up empty. I don't think my keywords are appropriate for genre of question query haha. Thanks ahead of time. Hey guys, I am making a site where certain content will be limited to "members only", where membership is free. Now I want the google bot or whatever bot to be able to see and index this content, but when a user visits it, I want to hide it from them unless they are a member (I already do that). So basically I want to have a function that I can call that will return true or false if the current page is being requested by a search engine spider. I know it's possible because I regularly see forums doing that; posts are hidden unless you register but if you look through the google cached version, the posts are visible. How can I do that? so far all I have is the following, so that the rest of my code works. function is_spider(){ return true; } I read a http://iarematt.com/how-to-detect-a-search-engine-spidercrawler-with-php/ which talks about this, but I don't really trust it... What do you guys think, How can this be done? Hi all, I need to get all businesses including details from http://www.nswbusinesschamber.com.au/Business-directory.aspx?name=&location= . What is the best way to approach this? Are there any scripts out there? Thanks... My dad loves these frozen cheeseburgers from meijer so I was gonna write a little script I can run in cron that will check Meijer's website and txt or email or something if they go on sale. Whenever I run the below script I get an Access Denied response from the server instead of the html for the cheesburger page. I'm sure I just need a CURL option or something. Thank You in Advance
|