Learn VBA & Macros in 1 Week!

PHP - Need Help Parsing / Extracting Links From Log Files

Full Excel VBA Course - Beginner to Expert

Need Help Parsing / Extracting Links From Log Files	View Content

I am looking for some help with extracting links from log files, as it is a pain to do this manually (which I do right now). I basically have some log files which I need to check for ERROR messages and copy and paste the found URL's into another text file.

My log file format looks like this:

Code: [Select]
INFO <11 Feb 2012 00:00:23,822> <index> <D2> <Processing URL : http://www.domain1.com/>
INFO <11 Feb 2012 00:00:23,842> <index> <D4> <Indexed: http://www.domain2.com/> <Time:146 msecs>
INFO <11 Feb 2012 00:00:23,842> <index> <D4> <Processing URL : http://www.domain3.com/>
ERROR <11 Feb 2012 00:00:23,924> <index> <D1> <http://www.domain4.org/operas/2003-2004/mourning/composer.aspx: >
org.apache.commons.httpclient.HttpRecoverableException: org.apache.commons.httpclient.HttpRecoverableException: Error in parsing
at org.apache.commons.httpclient.HttpMethodBase.readResponse(HttpMethodBase.java:1965)
at org.apache.commons.httpclient.HttpMethodBase.processRequest(HttpMethodBase.java:2659)
at org.apache.commons.httpclient.HttpMethodBase.execute(HttpMethodBase.java:1093)
at org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:674)
at org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:529)
at com.searchblox.scanner.http.HTTPScanner.b(Unknown Source)
at com.searchblox.scanner.http.HTTPScanner.scan(Unknown Source)
at com.searchblox.scanner.http.HTTPScanner.work(Unknown Source)
at com.searchblox.scanner.Scanner.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)
INFO <11 Feb 2012 00:00:23,968> <index> <D5> <Indexed: http://domain6.com/~cdobie/kearnsindex.htm>
INFO <11 Feb 2012 00:00:23,968> <index> <D5> <Indexed: http://domain7.com/~cdobie/kearnsindex.htm>
INFO <11 Feb 2012 00:00:32,988> <index> <D1> <Processing URL : http://www.domain8.com/>
INFO <11 Feb 2012 00:00:33,072> <index> <D5> <Indexed: http://www.domain9.com/> <Time:128 msecs>
INFO <11 Feb 2012 00:00:33,072> <index> <D5> <Processing URL : http://www.domain10.com/>
ERROR <11 Feb 2012 00:00:33,116> <index> <D2> <http://www.domain11.com/: Connection timeout>
org.apache.commons.httpclient.HttpConnection$ConnectionTimeoutException
at org.apache.commons.httpclient.HttpConnection.open(HttpConnection.java:736)
at org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:661)
at org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:529)
at com.searchblox.scanner.http.HTTPScanner.b(Unknown Source)
at com.searchblox.scanner.http.HTTPScanner.scan(Unknown Source)
at com.searchblox.scanner.http.HTTPScanner.work(Unknown Source)
at com.searchblox.scanner.Scanner.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)
INFO <11 Feb 2012 00:00:33,154> <index> <D1> <Indexing http://www.domain12.com/ ...>
INFO <11 Feb 2012 00:00:33,159> <index> <D1> <http://www.domain13.com/ - Last-Modified date: Sat Feb 11 00:00:33 CET 2012>
ERROR <11 Feb 2012 00:00:33,207> <index> <D6> <http://www.domain14.com/: Connection timeout>

Now what I am after is some piece of code which basically saves the http://domain.com/ part to a text file IF the line starts with ERROR. There are many different error reasons, so the strings are all different at the start and at the end, so maybe you know a way to open a log file, look out for the word ERROR at the beginning of a line and if that's the case, either save the whole line to another text file or if possible just the domain part (which would be even more great)

If possible, please post a fully functional code block, as I am extremely bad with anything that has to do with regex, opening and closing files etc.

Your help would be greatly appreciated

I attached a sample log file to this post in case it helps (same as the lines above)

Full Excel VBA Course - Beginner to Expert

Extracting All Links From Entire Website

Similar Tutorials

PHP - Need Help Parsing / Extracting Links From Log Files

Need Help Parsing / Extracting Links From Log Files

Similar Tutorials

Extracting All Links From Entire Website

Parsing Excel Files

Assistance Parsing Multiple Links Using Dom

Moved: Parsing A Larger Number Of Locally Based Files...

Trouble Creating Active Links With Php And .inc Files

List All Files In Dir, Then Put Content Of Files Into Array

Trying To Figure Out Why My Upload.php Files Is Rejecting Some Files

Extracting From Url

Extracting And Chmod

Rss Data Extracting

Extracting Data

Extracting Email From Url

Extracting A Value From A Array Using The Key

Extracting Data

Extracting All Ip's Of A String

Extracting Data

Extracting Data From Xml

Extracting Image From .swf File

Extracting Data From A Query

Timestamp - Extracting Time