PHP - Extracting All Links From Entire Website
Is it possible to extract all links from a website (not a single webpage) by php? I am asking about the general idea, as I wish to customize: e.g. from a specified directory and certain domains only.
Thanks Similar TutorialsI am looking for some help with extracting links from log files, as it is a pain to do this manually (which I do right now). I basically have some log files which I need to check for ERROR messages and copy and paste the found URL's into another text file. My log file format looks like this: Code: [Select] INFO <11 Feb 2012 00:00:23,822> <index> <D2> <Processing URL : http://www.domain1.com/> INFO <11 Feb 2012 00:00:23,842> <index> <D4> <Indexed: http://www.domain2.com/> <Time:146 msecs> INFO <11 Feb 2012 00:00:23,842> <index> <D4> <Processing URL : http://www.domain3.com/> ERROR <11 Feb 2012 00:00:23,924> <index> <D1> <http://www.domain4.org/operas/2003-2004/mourning/composer.aspx: > org.apache.commons.httpclient.HttpRecoverableException: org.apache.commons.httpclient.HttpRecoverableException: Error in parsing at org.apache.commons.httpclient.HttpMethodBase.readResponse(HttpMethodBase.java:1965) at org.apache.commons.httpclient.HttpMethodBase.processRequest(HttpMethodBase.java:2659) at org.apache.commons.httpclient.HttpMethodBase.execute(HttpMethodBase.java:1093) at org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:674) at org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:529) at com.searchblox.scanner.http.HTTPScanner.b(Unknown Source) at com.searchblox.scanner.http.HTTPScanner.scan(Unknown Source) at com.searchblox.scanner.http.HTTPScanner.work(Unknown Source) at com.searchblox.scanner.Scanner.run(Unknown Source) at java.lang.Thread.run(Unknown Source) INFO <11 Feb 2012 00:00:23,968> <index> <D5> <Indexed: http://domain6.com/~cdobie/kearnsindex.htm> INFO <11 Feb 2012 00:00:23,968> <index> <D5> <Indexed: http://domain7.com/~cdobie/kearnsindex.htm> INFO <11 Feb 2012 00:00:32,988> <index> <D1> <Processing URL : http://www.domain8.com/> INFO <11 Feb 2012 00:00:33,072> <index> <D5> <Indexed: http://www.domain9.com/> <Time:128 msecs> INFO <11 Feb 2012 00:00:33,072> <index> <D5> <Processing URL : http://www.domain10.com/> ERROR <11 Feb 2012 00:00:33,116> <index> <D2> <http://www.domain11.com/: Connection timeout> org.apache.commons.httpclient.HttpConnection$ConnectionTimeoutException at org.apache.commons.httpclient.HttpConnection.open(HttpConnection.java:736) at org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:661) at org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:529) at com.searchblox.scanner.http.HTTPScanner.b(Unknown Source) at com.searchblox.scanner.http.HTTPScanner.scan(Unknown Source) at com.searchblox.scanner.http.HTTPScanner.work(Unknown Source) at com.searchblox.scanner.Scanner.run(Unknown Source) at java.lang.Thread.run(Unknown Source) INFO <11 Feb 2012 00:00:33,154> <index> <D1> <Indexing http://www.domain12.com/ ...> INFO <11 Feb 2012 00:00:33,159> <index> <D1> <http://www.domain13.com/ - Last-Modified date: Sat Feb 11 00:00:33 CET 2012> ERROR <11 Feb 2012 00:00:33,207> <index> <D6> <http://www.domain14.com/: Connection timeout> Now what I am after is some piece of code which basically saves the http://domain.com/ part to a text file IF the line starts with ERROR. There are many different error reasons, so the strings are all different at the start and at the end, so maybe you know a way to open a log file, look out for the word ERROR at the beginning of a line and if that's the case, either save the whole line to another text file or if possible just the domain part (which would be even more great) If possible, please post a fully functional code block, as I am extremely bad with anything that has to do with regex, opening and closing files etc. Your help would be greatly appreciated I attached a sample log file to this post in case it helps (same as the lines above) This topic has been moved to Apache HTTP Server. http://www.phpfreaks.com/forums/index.php?topic=320065.0 I'm wanting to extract an image from an external website and save to a location on my web server automatically. The scenario: 1. A user has an order form on our website with a field to paste the URL of a website in; 2. The user visits an external website page, which includes an image; 3. The user copies the URL out of the address bar, goes back to our website and pastes the link into the order form; 4. The user clicks next, and this extracts the image from the external website and uploads it to our website so the image can be seen alongside the order. My questions a a. Can this be done? b. If it can't physically download the image and copy to my web server, can I read the html source of the external website and grab the full URL to use to show the image on my website? Thanks for your help. I tried searching on google but couldn't find any relevant information, please redirect me to relevant source or help me with the code. I want to pass a domain name in text field which will be scanned and then the script will display entire site map. Not external links or links on a page. Sorry it is not easy for me to explain. Eg. if i pass abc.com Script will display abc.com/12/adn.php abc.com/asd/asd/ etc Whatever their url format is. All the links on that domain. HI, Does anybody knows a script that extract all of a website links.I mean I enter a website url and It begins to extract all of the links that exist in that website. Thanks I'm trying to crawl for links in a specific website and show them at the end. The problem i'm facing is that it only show the links from the specific page not the whole pages in the website. I tried several loops with no success please give some advise. Here is the code: <?php if (isset($_POST['Submit'])) { function getLinks($link) { /*** return array ***/ $ret = array(); /*** a new dom object ***/ $dom = new domDocument; /*** get the HTML (suppress errors) ***/ @$dom->loadHTML(file_get_contents($link)); /*** remove silly white space ***/ $dom->preserveWhiteSpace = false; /*** get the links from the HTML ***/ $links = $dom->getElementsByTagName('a'); /*** loop over the links ***/ foreach ($links as $tag) { $ret[$tag->getAttribute('href')] = $tag->childNodes->item(0)->nodeValue; } return $ret; } /*** a link to search ***/ $link = $_POST['address']; /*** get the links ***/ $urls = getLinks($link); /*** check for results ***/ if(sizeof($urls) > 0) { foreach($urls as $key=>$value) { if (preg_match('/^(http|https):\/\/([a-z0-9-]\.+)*/i',$key)) { echo '<span style="color:RED;">' . $key .' - external</span><br >'; } else { echo '<span style="color:BLUE;">' . $link . $key . ' - internal</span><br >'; } } } else { echo "No links found at $link"; } } ?> <br /><br /> <form action="" method="post" enctype="multipart/form-data" name="link"> <input name="address" type="text" value="" /> <input name="Submit" type="Submit" /> </form> Hello,
I am hoping to get some help with this. I want to apologize in advance as I am not a developer more of a systems admin guy. Okay guys, here is the deal. My boss has a website, which uses from my understanding Wordpress for design, but also uses PHP. Since, I am not too familiar with the general uses of PHP, I cannot explain why they are doing it that way. The website also uses MYSQL, which I would imagine works with PHP to gather data from some back end server.
The website has five tabs on the top right corner
about services resources contact us login
When you hover over these five tabs they expand and show more menus. The tab with the issue is the services tab. When you hover over it, it works as anticipated; it expands and shows our services. Upon getting on there, there are a bunch of services, which you can click on. When you first click, it works as anticipated. However, if you try to click on another service within that category, it simply does not load the page. Now if you decide to go into a different service it works, but once you try to click on another services within x category it just does not work.
I would really appreciate everyone's help on this. It would be nice to get this resolve.
I can provide you guys with the website and php scripts and codes if need be.
This is the website
http://beta.morrowco.com/
Thanks,
Jeff M
need help i startet to regenerate an old website but the one or other thing dont works so like that: Code: [Select] <? require("droplist.inc.php"); if (!isset($region)) $region = 0; $region_names = array("Laglamia", "Dekardi", "Dekadun", "Dekaran", "Shilon", "Searoost", "Paros", "GWH"); $region_monster_count = array(14, 12, 18, 19, 16, 26, 19, 8); ?> <head> <title>Dropliste</title> <meta http-equiv=content-type content="text/html; charset=windows-1252"> <meta http-equiv=imagetoolbar content=no> <link href="image/style.css" rel=stylesheet type="text/css"> </head> <body bgColor="#2d2d2d" link="#FFFFFF" vlink="#FFFFFF" alink="#FF9900"> <hr> <div align="center"><font size="2" face="Verdana, Arial, Helvetica, sans-serif"> <? for ($idx = 0; $idx < count($region_names); $idx++) { print("<a href=\"droplist.php?region=".$idx."\">"); if ($idx == $region) printf("<font color=\"#FF9900\">"); print($region_names[$idx]); if ($idx == $region) printf("</font>"); print("</a>"); if ($idx < count($region_names) - 1) print(" | "); print("\n"); } ?> </font></div> <hr> <table cellSpacing=0 cellPadding=1 width="95%" align=center border=0> <tr><td colSpan=3 height=1><br></td></tr> <tr><td bgColor=black colSpan=3 height=1></td></tr> <tr><td bgColor=silver colSpan=3 height=1></td></tr> <tr><td colSpan=3 height=1><br></td></tr> <? for ($idx = 0; $idx < $region_monster_count[$region]; $idx++) { printf(" <tr> <td width=135 valign=top align=center> <font face=\"Arial, Helvetica, sans-serif\" size=2> <img width=130 height=140 src=\"image/droplist/".$monsters[$region][$idx][0].".gif\" border=0> </font> </td> <td width=* valign=top> <font face=\"Arial, Helvetica, sans-serif\" size=2> <b>".$monsters[$region][$idx][1]."</b> <strong>[ Level ".$monsters[$region][$idx][2]." ]</strong> <br> <font color=#ffff00>Drops:</font><br><font face=\"Arial, Helvetica, sans-serif\" size=1>".$monsters[$region][$idx][3]."</font> <br> <font color=#ffff00>Random-Drops:</font><br><font face=\"Arial, Helvetica, sans-serif\" size=1>".$monsters[$region][$idx][4]."</font> </font> </td> </tr> <tr><td colSpan=3 height=1><br></td></tr> <tr><td bgColor=black colSpan=3 height=1></td></tr> <tr><td bgColor=silver colSpan=3 height=1></td></tr> <tr><td colSpan=3 height=1><br></td></tr> "); } ?> </table> <hr> <div align="center"><font size="2" face="Verdana, Arial, Helvetica, sans-serif"> <? for ($idx = 0; $idx < count($region_names); $idx++) { print("<a href=\"droplist.php?region=".$idx."\">"); if ($idx == $region) printf("<font color=\"#FF9900\">"); print($region_names[$idx]); if ($idx == $region) printf("</font>"); print("</a>"); if ($idx < count($region_names) - 1) print(" | "); print("\n"); } ?> </font></div> <hr> </body> </html> The content iss in another file the problem iss the links (you can see the page the http://psychadelics.co.de/droplist.php ) dont work if i click on it so the funktions must be have errors can annyone help me that if i click for example Dekardi that the list shows up for it? i hope annyone can help^^ Hello everyone, I have a general question about something that I've been thinking about doing but have not tried yet. It involves including entire php pages rather than a short snippet of code. Here's my situation: I've created a website that caters to many different schools, each of which is assigned a sub-domain so that they have "their own" web site instead of going to one site and then clicking a link to get to their page. Every site is identical with the exception of a few images. The way I'm currently doing things is to upload every page of code into each folder the each website. Doing it this way, I'm using a lot of disk space and each time I edit one page of code, I have to upload that page to every folder in my directory. It's not too bad with only a few folders, but as more schools use my site, it could become a nightmare! What I'm thinking about doing is to upload all my pages to the parent directory, then on each page in my folders for the sub-domains just include the corresponding page in the parent directory. I will just keep the images for each sub-domain in the corresponding folder. For example, the actual code for the index page in each of my sub-domain folders would be this: Code: [Select] <?php include("../indexpage.inc.php"); ?> This way, I use less disk space and if I need to edit some code, I do it once, and upload it once to the parent directory, instead of having to upload it to each and every folder. Does anyone see any problem with doing this? Thanks for your opinion. What I'm trying to do is get my $_SESSION to work throughout my website.
I'm quite new to PHP, so the PHP Manual didn't make much sense to me, so I thought I'd post here! ^^
I've got my login script under /session/ and I want it to be able to display your username on the homepage (/), but it only works inside /session/.
If you are wondering, I am using PHP-Login Advanced.
Thank you,
- Connor!
Edited by ConnorMcF, 26 January 2015 - 04:45 PM. I have an array containing webpages. $results = array(www.google.com, www.phpfreaks.com); echo file_get_contents($results[0]); echo file_get_contents($results[1]); Is there anyway to just echo get_file_contents without putting in the position in the array as I have hundreds of variables in it? Hello again, I'm trying to scrape a table from another website using preg_match, especifically, using this code: Code: [Select] <?php $data = file_get_contents('http://tvcountdown.com/index.php'); $regex = '/[color=red]<table class="episode_list_table">[/color] (.+?) [color=red]</table>[/color]/'; preg_match($regex,$data,$match); var_dump($match); echo $match[0]; ?> Heres the thing. It doesnt work I think it's because the first and second anchors are html tags, 'cause if I parse some other stuff without any tag, there's no problemo. Any hints, mates? Thanks I'm trying to find a way to back up an entire servers file through PHP. I have a script that can take selected files into a zip file but I'm not sure how to make sure that when I loop through directory and files that I get every single file. I was thinking of just foreach(glob('*') as $file up to 10 times but I'm hoping there's something more definite then that method... Thanks When i pass entire url in a get variable as http://www.example.com/index.php?url=http://www.yahoo.com/ I am getting the following error Forbidden You don't have permission to access /index.php on this server. Additionally, a 404 Not Found error was encountered while trying to use an ErrorDocument to handle the request. But there is no problem when i use the following http://www.example.com/index.php?url=www.yahoo.com/ http://www.example.com/index.php?url=yahoo.com/ http://www.example.com/index.php?url=yahoo&value=1 I am getting "Forbidden-error" only when i include "http://"(even if it is urlencoded) in get variables. Can any one help me... I hate array... 😞 So I had a block of code inside my photo-gallery.php script that took the path to my photos directory, and went to that directory, and then read all of the photo filenames into an array. Then in my HTML, I iterate through this array to display all of the photos for my gallery. Now I would like to move that code to an included file so multiple scripts can access it and always be working with the same array. It seems to me that I need to encapsulate my code inside a function? Then I could call my getPhotoFilesArray back to my callings cript, and use that array for whatever. I haven't coded PHP in like 4 years and I am struggling to return the entire array back to my caling script. This is what I have so far... function getPhotoFilesArray($photoPath){ $photoFiles = array(); <code to find corresponding files> $photoFiles gets populated in a loop return $photoFiles; }
Then in my calling script, I have... <?php require_once('../../../secure_outside_webroot/config.php'); require_once(WEB_ROOT . 'utilities/functions.php'); getPhotoFilesArray($photoPath); var_dump($photoFiles);
I get some error...
Notice: Undefined variable: phtoFiles in photo-gallery.php line 133 (which is my var_dump).
<br> Would appreciate help getting this to work!
Edited December 6, 2019 by SaranacLake Folks, I want to extract certain portion form URL. Exmaple: Quote http://abc.com/this-is-test.html Output should be Quote this-is-test Another Example Quote http://abc.com/this-is-yet-another-test.html Output should be Quote this-is-yet-another-test I am not sure how it can be done with preg_match() and regex or something like that... Can someone help me with this please? Cheers Natasha i hav dis checkstatus.php page where the user enters a random number sms to him..but after tht he shud be able to see d reply given by admin ..d problem over here is d user only views half of the field called 'anything' (my database is named "proj") heres my code jst cant understand why dis is happening...pls help This is my preg_match code:
preg_match("/(\d+)|(T)|(A)/", $link, $matches, PREG_OFFSET_CAPTURE, ($off-30)
It works and matches the number I want, except it only returns the first character in the $matches array. For instance, if the number it finds is 10, it only returns 1. Now I understand that is what preg_match does, but how would I make that regex ungreedy? I tried adding a *, but it just made an error, and preg match all would keep searching for strings after the first one. All I want is for the first number, regardless of the length of it, to be returned in full. Thanks for any help.
I have two domains. I'd like to replace a word depending on the URL the user has typed in. For example: Original site is all about red pens. Well I want to make another domain that is bluepens.com, but when they enter in bluepens.com it uses the same pages from red pens except wherever the word 'red' is found it is changed to blue pens making it a whole new site. I know I can use str_replace() for certain instances, but how would you do it for an entire website using the url? Thanks! I actually asked a question here yesterday and decided to try a different route with it. What I am doing is passing an email variable entered from my home page on to www.dollapal.com/offerlist.php. I'm wanting this page to be a complete list of all of my entries in my surveys table. The email variable needs to be appended to the end of every link. I got that to work, but what I want to do now is display that information for every record in my 'surveys' table. Right now I am using the random function, which I'm sure is wrong, but I'm not sure what function to be looking for. Is it possible to use a foreach function here to echo each record? If so, I'm not sure how exactly to call the foreach() function in this case. I believe my two problems lie in the random and foreach functions, but I'm not sure how to correct them. I've attached a little chunk of code that I'm working with. I'm not sure if I'm completely off base here, or if I'm close to achieving my desired result. Please let me know if you require more information. This forum has been amazing to me so far. Thank you all for your help! |