PHP - Extracting Http Urls From A Text File
I have html files in which, there are lines of urls starting with http:// (simple text, not hyperlink) without a tag. What is the simplest way to extract them?
Similar TutorialsDear all, is there any library that supports text extraction from docx,doc, excel, pdf, etc formats like Apache POI does on Java? Or should I port Apache POI classes to PHP code? best regards, ethereal1m I have a large text file that I need to search and extract text from. I have some code that somewhat works but is not good for what I need because it only reads one line at a time. I need to be able to echo all code between two strings and continue scanning the entire document. I am attaching the TXT file that is being read by the script: Here is the script: Code: [Select] <? $searchthis = "Problem:"; $search="Check:"; $matches = array(); $handle = @fopen("1numbers.txt", "r")or die("can't open file"); if ($handle) { while (!feof($handle)) { $buffer = fgets($handle); if(strpos($buffer, $searchthis) !== FALSE) echo "<br>". $buffer."<br>"; if(strpos($buffer, $search) !== FALSE) echo "<br>". $buffer."<br>"; } fclose($handle); } ?> you can see what this script outputs by visiting this link: http://yourautofix.com/data/data.php but my problem is it only outputs one line of text that finds the search match. I need it to output all lines of text between two matches for example any text between "Problem:" and "Check:" should be Echo'd and any text between "Check:" and "Likely:" should be echo'd there may be 1 line or 20 lines of text between the tags... I need to print all lines between the 2 determined search strings and then continue through the text file displaying all matches between the search strings in a large file. any thoughts on how I can get this done or point me in the right direction? Thanks for any input on this Paul Hi I'm learning php and trying to write a script to extract registration information from a large text file. Sadly my meagre knowledge of php is letting me down a bit. It's a case of knowing what you want the script to do but not having the knowlege of how to 'say it'. So i was hoping that if I posted my code here someone could either give me a few pointers on where i am going wrong or suggest a better way. The text file data luckily has a recurring format as follows (for brevity i've only included one entry, which contains made up information): From: bella_done@yahoo.co.uk Sent: 02 February 2011 22:50 To: Jonny tum, patsy fells, dingly bongo Subject: Subject: Fun Run 2010 Categories: Fun Run Name: Bella Donna Address: 14 brondle avenue Postcode: cd83 1rg Phone: 0287343510 Email: bella_don@yahoo.co.uk DOB: 15/11/1945 Half or Full: Full fun run How did you hear: Took part in 2010 As you can see the data has a convenient boundary at the 'from' field and the colon (or so it occurred to me) so I created my script as follows: // the string being analysed $the_string = " From: bella_done@yahoo.co.uk Sent: 02 February 2011 22:50 To: Jonny tum, patsy fells, dingly bongo Subject: Subject: Fun Run 2010 Categories: Fun Run Name: Bella Donna Address: 14 brondle avenue Postcode: cd83 1rg Phone: 0287343510 Email: bella_don@yahoo.co.uk DOB: 15/11/1945 Half or Full: Full fun run How did you hear: Took part in 2010"; // remove all formatting to work with a clean string $clean_string = strip_tags($the_string); // remove form field entries from the data and replace with commas and a ZZZ boundary $remove_fields = array("Categories:" => "","Name:" => ",","Address:" => ",","Postcode:" => ",","Phone:" => ",","Email:" => ",","DOB:" => ",","Half or Full:" => ",","How did you hear:" => ",","From:" => "ZZZ","Sent:" => ",","To:" => ",", ); $new_string = strtr("$clean_string",$remove_fields); // split the data at the boundary ZZZ $string_to_array = explode("ZZZ", $new_string); $new_string2 = implode("</br>",$string_to_array); echo $new_string2; $myFile = "address_list.csv"; $fh = fopen($myFile, 'w') or die("can't open file"); $stringData = $new_string2; fwrite($fh, $stringData); fclose($fh); One major problem is when i write the new data to a csv file the csv contains spacings that cause it to be reproduced in a column form rather than as separate fields for each comma boundary. So can anyone suggest either a) a better way of extracting the data from the text file (doesn't need to be 100% clean and perfect) b) How can i stop the spaces in the csv (i thought i would have fixed this when i stripped the tags from the string at the start??). Any help would be greatly received by a newbie phper. It's my first shot at performing anything moderately taxing so if I've made some blaring oversites I would very much welcome your wisdom! Thank you Drongo Hi! I have .swf files that have images in them. I can view them with swf decompiler on my computer but i need to have this functionality on my website. Is there any way to extract images from .swf file with php? I'm normally fairly proficient with PHP, but I haven't done any coding in quite a while, so I'm a little rusty. I have an entire page of text from which I need to extract a single value. Here is a small portion of the page in question: Code: [Select] Total Rank: 128 Total Points: 4,978 Next Rank: 20 For instance, I need to extract the values "128" "4978" and "20" and store them in variables. These values change all the time, so I'm not sure what the best way to go about this is... maybe a regular expression ? If that's the case, I've never been too good with them, so any help would be appreciated. I have managed to get this to work but it seems like it is a very long and messy solution. I was wondering if anyone had an idea of how this can be done better. I am new to php and don't know a lot. It shows the text between the tags <h1> and </h1> from the content of a different file Basically I had to start the substr() from the fourth position so it would actually skip the "<h1>" being included, and because I started on the fourth postion I then had to finish four places back to skip the "</h1>" being included. Code: [Select] <?php $id = $_GET['id']; $homepage = file_get_contents("./".$id.".php"); $title = stristr($homepage,"<h1>"); $titlepos = strpos($homepage,"</h1>"); $endpos = $titlepos - 4; echo "Title " . substr($title,4,$endpos); ?> Folks, I tired all my PHP skills to extract domain name strings from a RSS Feed and put each domain name as an Array element, but all in vain: Here is the RSS: http://bulliesatwork.co.uk/master/dev/domp/expdom/domains.php() What i want to extract: Quote Do you see a list of domain names, which are Anchored, all i need is to extract these domain names llik "abc.co uk" (observe there is a space between .co and uk, which can be removed with str_replace()) Here is my first try: (Using SimpleHTMLDomParser) Code: [Select] require_once('simple_html_dom.php'); $html = file_get_html('http://bulliesatwork.co.uk/master/dev/domp/expdom/domains.php'); $domains = $html->find('div[class="entry"] a', 0); foreach($domains as $dom) { echo str_replace(' ', '.', $dom->plaintext); } $html->clear(); unset($html); Here is my another try with DOM Document: Code: [Select] $scrapeurl = 'http://bulliesatwork.co.uk/master/dev/domp/expdom/domains.php'; $keywords = file_get_contents($scrapeurl); $keywords = json_decode($keywords); foreach( $keywords->responseData->results as $keyword) { echo str_replace("...",".",$keyword->title).'<br/>'; } In both the cases, DOM document is created but it seems the Document has all information except the Domain names i want to extract. Please help me out to extract the doamin names. Cheers Hi there, In my attached PHP script, I extract text between two strings in the input file and write the extracted text to an output file. Everything seems to work fine, except I can't figure out how to include the row that says "Richland" (after the row that says "Creighton") in the extracted text. If someone could guide me how to do this, I'd greatly appreciate it. The PHP script is attached. The input file is in htm format and I can't attach that here so I will provide a link to the file I'm calling: http://www.afws.net/data/pa/savedata/109/06/2009060920.pa.htm Many thanks!!! I have a bunch of pdf's and I want to extract text from the last page of every pdf. I have a function to count the number of pages in each pdf. Does anyone know of a way that I could extract file from a specified file and page number. example: getData('example.pdf', 54); RewriteCond %{HTTP_HOST} !^www.example.com.au$ [NC] RewriteCond %{HTTP_HOST} !^https://www.example.com.au$ [NC] RewriteRule ^(.*)$ http://www.example.com.au/$1 [R,L]Hi All I have my website 90% working on https, fully working on http and I have managed to redirect from none www to www What I am looking to do it if the user happens to enter https then make them go to my www version I have manage to get them to go to my www version if they do not put in www but I am lost in how to redirect them from https to http here is what I have so far and I just cannot figure it out thanks Alan Hello, I am new to the forum and somewhat new to php, nice to meet you all. This is the first time I have ever really scripted with PHP so I'm still learning about all the tools I have and what I have to call things in PHP. I have a list of urls and as I loop through each one, I'd like to be able to get information from the webpage. The <title> would be a good start. I also want to know the best way for me to compare data I have. I'll show the basic code below, but I successfully go through each url in this text file. I put it in a <ul><li> list just fine. So if $url == http://www.youtube.com/file how is the normal way to check and see if the word "youtube" is in $url? I found preg_match() but I think I'm approaching the whole thing wrong because I get no output. I am an intermediate to somewhat advanced scripter in other languages similar to php, I just need to learn how you do the normal things in PHP. So I'd like to compare a string "youtube" to a variable '$url'. And I would like to be able to grab the title or other info from the file $url. Here is what I have so far. (Recent research showed me how I should do this with an XML file so I will probably change the .txt to .xml) Can you please tell me what to look for as I have been searching and can't really find a comprehensive answer. I changed the whole page to an echo trying to fix something last night. Before it was written like.. Code: [Select] <?php if ($true) { $var = value ?> <html code>The value is <?php $var ?> .</html code> <?php } ?> Code: [Select] /index.php <?php include 'include/header.html'; echo "<div id='wrapper'> <div id='left'> <div class='article'> <br /> <p>"; echo "Today is " . date("l") . ", the " . date("jS") . " of " . date("F") . "."; $lines = file('data/news.txt'); if ($lines){ foreach ($lines as $line_num => $line) { $url = htmlspecialchars($line); //Now I have url. I want to check the url and get the <title> & misc. data. //if youtube is in $url {html code to embed youtube}; //my attempt was $x = file($url); but I got a lot of 404 and 403 errors. //now I fill html. echo "<ul id='menu1' class='auroramenu'> <li><a href='#'>Story ".$line_num."</a> <a style='display: none;' class='aurorashow' href='#'></a> <a style='display: inline;' class='aurorahide' href='#'></a> <ul> <br /> <p>".$url."</p><br /> <li style='text-align:right;'><a href='".$url."' target='_blank'>Read the story.</a> </li> </ul> </li> </ul>"; } } echo "</p><br /> </div> </div> <div id='right'>"; include 'include/sidebar.html'; echo "</div><br class='clr' /></div><br />"; include 'include/footer.html'; ?> Thank you for your help. I need to check (from a Debian server to a Mac CLIENT (not server) machine) to see if a file exists on the file system...not an http server. Here's what I've tried (based on reading at http://www.php.net/manual/en/function.ssh2-exec.php) Code: [Select] <?php function check_remote_files($username, $hostname, $remote_file) { $connection = ssh2_connect($hostname, 22, array('hostkey'=>'ssh-rsa')); if (ssh2_auth_pubkey_file($connection, $username, '/root/.ssh/id_rsa.pub', '/root/.ssh/id_rsa', '')) { echo "Public Key Authentication On Server #2 Successful\n"; } $url = "if ssh $hostname stat $remote_file \> /dev/null 2\>\&1"; $stream=ssh2_exec($connection,$url); stream_set_blocking( $stream, true ); $cmd=fread($stream,4096); fclose($stream); $stream=ssh2_exec($connection,"then"); stream_set_blocking( $stream, true ); $cmd=fread($stream,4096); fclose($stream); $stream=ssh2_exec($connection,"echo File exists"); stream_set_blocking( $stream, true ); $cmd=fread($stream,4096); fclose($stream); $stream=ssh2_exec($connection,"else"); stream_set_blocking( $stream, true ); $cmd=fread($stream,4096); fclose($stream); $stream=ssh2_exec($connection,"echo Not found"); stream_set_blocking( $stream, true ); $cmd=fread($stream,4096); fclose($stream); $stream=ssh2_exec($connection,"fi"); stream_set_blocking( $stream, true ); $cmd=fread($stream,4096); fclose($stream); } ?> NOTHING happens after the echo of Public Key Authentication On Server #2 Successful. I'm running this script from a bash shell. Any ideas what might be wrong? Thanks. First thanks for any help that you can provide. I am somewhat new to PHP and this is the first "Web Service" I have had to create. THE GOAL: I need to pull XML data from another server. This company's API is setup so that you must give an IP and so you can only pull data from server to server instead of client to server. The data is pulled from the API using HTTP requests...very similar to YQL. (in essence the structured query is located in the URL). This API also requires that my server only ping's their server every 10-15 mins, in order to cut down on server requests. The logical thought in my head was to setup: a cron job to run a PHP script every 10 mins. The PHP scripts would then do the following: 1. Make the HTTP request 2. Open an existing file or create one (on my server) 3. Take the returned XML data from the API and write to the newly opened file. 4. Convert that XML to JSON. 5. Save the JSON 6. Cache the JSON file 7. STOP My thought was to use curl and fopen for the first 3 steps. I found a basic script that will do this on PHP.net (as shown below). After that I am pretty much lost on how to proceed. Code: [Select] <?php $ch = curl_init("http://www.example.com/"); $fp = fopen("example_homepage.txt", "w"); curl_setopt($ch, CURLOPT_FILE, $fp); curl_setopt($ch, CURLOPT_HEADER, 0); curl_exec($ch); curl_close($ch); fclose($fp); ?> I would REALLY appreciate any help on this. Also, if you have the time to please comment and explain what you are doing in any code examples and why....that would HELP out a lot. I truly want to learn not just grab a snippet and run. So, your comments are vital for me. thank!!! Hi all, I am looking a developing a script that instead of using something like mysite.com/showproduct.php?id=1 I would like it to appear as mysite.com/php-editor Is their a simple way of doing this? I would still need to pull info from a database so the page would still need to be dynamic, I just want it to appear static! I am used to using the get id function of php, what would the workaround be? would the id be hidden from the url but still usable in a query? I have had a look at the apache mod_rewrite, but quite frankly, I dont understand it! Cheers Hey! I am trying to get a database table of users but am running into the error: Warning: file_get_contents(http://protege-ontology-editor-knowledge-acquisition-system.136.n4.nabble.com/template/NamlServlet.jtp?macro=user_nodes&user=68583) [function.file-get-contents]: failed to open stream: HTTP request failed! HTTP/1.1 500 in get_user_from_parameter(nabble:utilities.naml:890) - <n.get_user_from_parameter.as_user_page.do/> - public v in C:\xampp\htdocs\website4js\stanford\loadUsernames.php on line 32 I have looked it up and it may be a security thing...The urls I am getting are http://protege-ontology-editor-knowledge-acquisition-system.136.n4.nabble.com/template/NamlServlet.jtp?macro=user_nodes&user=68583 but the error adds an nodes& part that screws it up. Any way around this? Code: [Select] <?PHP $maxPage = 0; $mainPage = "http://protege-ontology-editor-knowledge-acquisition-system.136.n4.nabble.com/template/NamlServlet.jtp?macro=app_people&node=136"; $mainContent = file_get_contents($mainPage); $pattern = "/(?<=\"Page )\d\d+/"; preg_match_all($pattern, $mainContent, $pageNumb); //find the max page for($i=0;$i<sizeof($pageNumb[0]);$i++) { if($pageNumb[0][$i] > $maxPage) { $maxPage = $pageNumb[0][$i]; } } //echo('Max page is: '.$maxPage.'\n'); //Get an array of all the pages $pages = array(); for($i=1;$i<$maxPage;$i++) { $pages[$i] = "http://protege-ontology-editor-knowledge-acquisition-system.136.n4.nabble.com/template/NamlServlet.jtp?macro=app_people&node=136&i=".($i*20); } //print_r($userPages); //Get personal page urls and add to MySQL mysql_connect("localhost" , "root") or die("Cant connect"); mysql_select_db("protegeusers") or die("Cant find database"); foreach($pages as $url) { $urlContents = file_get_contents($url); $pattern = "/http:\/\/protege-ontology-editor-knowledge-acquisition-system\.136\.n4\.nabble\.com\/template\/NamlServlet\.jtp\?macro=.+;user=\d+/"; preg_match_all($pattern, $urlContents, $personalPage); foreach($personalPage as $user) { for($i=0; $i<sizeof($user);$i++) { [color=green]$userContents = file_get_contents($user[$i]);[/color] $pattern1 = "/user\/SendEmail\.jtp\?type=user.+;user=\d+/"; $pattern2 = "/(?<=\">Send Email to ).+(?=<)/"; preg_match_all($pattern1, $userContents, $userEmail); preg_match_all($pattern2, $userContents, $username); [color=green]print_r($username); print_r($email);[/color] //$query = "INSERT INTO users (username, userurl) values ('$userName','$userUrl')"; //mysql_query($query); } } } ?> Trying to run a simple program that, when submitted, stores the username and password as cookies. When clicking Submit, I get the error "HTTP Error 405 - The HTTP verb used to access this page is not allowed". If the username and password fields are left blank when submitting it's suppose to give a message to enter a username and password, but, I still get that error message. HTML form: Code: [Select] <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en"> <head> <title>Week 1 Project--Cookies</title> <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" /> </head> <body> <form action="cookie1.php" method="post"> <h2 align="center">Cookies</h2> <br /> <div> <p>Enter your username and password and click "Submit":</p><br /> <p>Username:<input type="text" name="username" size="20"></p> <p>Password:<input type="text" name="password" size="20"></p> </div> <br /> <div><input type="submit" name="submit" value="Submit" /></div> <br /> <div> <input type="reset" name="Reset" value="Start Over" /> </div> </form> </body> </html> PHP file: Code: [Select] <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en"> <head> <title>Cookie File</title> <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" /> </head> <body> <div> <?php if ($_SERVER['REQUEST_METHOD'] == 'POST') { setcookie('username', $_POST['username'], time() + 2592000); setcookie('password', $_POST['password'], time() + 2592000); } if(($_POST['username'] == "") || ($_POST['password'] == "")) { print "You must enter both a username and password. Press the Back button on your browser and try again."; } else if (isset($_COOKIE['username'])) { print "Welcome, " .$_COOKIE['username']; } ?> </div> </body> </html> How do I only redirect the page when index.php is present? I currently am working on a project where I code a "simple" telephone directory. There are three main tasks that it needs to do: 1. Directory.php(index page) has a "First Name" and "Last Name" field and a search button. When a name is searched from the directory.txt file, it displays First Name, Last Name, Address, City, State, Zip and phone in findinfo.php in designated text boxes...first name, last name, etc. 2. From the findinfo.php, like previously stated, the users information is listed in the appropriate text boxes. From there, there is an update button that will overwrite the user's information to directory.txt if that button is selected. It will then say the write was sucessful. 3. (completed this step) From the index page, there is a link that will take you to addnew.php where you enter First Name, Last Name, Address, City, State, Zip and phone in a web form and write it to directory.txt. This is the php code for the third step: <?php $newentryfile = fopen("directory.txt", "a+"); $firstname = $_POST['fname']; $lastname = $_POST['lname']; $address = $_POST['address']; $city = $_POST['city']; $state = $_POST['state']; $zip = $_POST['zip']; $phone = $_POST['phone']; $newentry = "$firstname $lastname\n\r $address\n\r $city, $state $zip\n\r $phone\n\r"; if (flock($newentryfile, LOCK_EX)) { if (fwrite($newentryfile, $newentry) > 0) echo "<p>" . stripslashes($firstname) . " " . stripslashes($lastname) . " has been added to the directory.</p>"; else echo "<p>Registration error!</p>"; flock($newentryfile, LOCK_UN); } else echo "<p>Cannot write to the file. Please try again later</p>"; fclose($newentryfile); if(empty($firstname) || empty($lastname) || empty($address) || empty($city) || empty ($state) || empty($zip) || empty($phone)) { echo "<p>Please go back and fill out all fields.</p>"; } ?> So to sum it all up, what would be my best approach? I am totally stumped and not sure which function to use. Should I work my way from step 1 to step 2? I see it as when I do the search for the name from directory.php, it takes me to findinfo.php, listing the users information in the text boxes. From there, if I needed to, having the user's information already listed I could hit the update button to overwrite the new information to directory.txt. Doing the update when then tell me that the write was successful. I have literally been scouring the internet for hours. What would be the best function to do this? I hope I was clear enough. Please help me out and thank you for your time. Ok i have been working on this for a day+ now. here is my delema simple .ini text file. when a user makes a change (via html form) it makes the correct adjustments. problem is the newline issue 1. if i put a "\n" at the end (when using fputs) works great, except everytime they edit the file it keeps adding a new line (i.e. 10 edits there are now 10 blank lines!!!!) 2. if i leave off the "\n" it appends the next "fgets" to that lilne making a mess Code: [Select] ##-- Loop thruoght the ORIGINAL file while( ! feof($old)) { ##-- Get a line of text $aline = fgets($old); ##-- We only need to check for "=" if(strpos($aline,"=") > 0 ) { ##-- Write NEW data to tmp file fputs($tmp,$info[$i]." = ".$rslt[$i]."\n"); $i++; } ##-- No Match else { fputs($tmp,$aline."\n"); }//Checking for match }//while eof(old) what in the world is making this such a big deal. i dont remember having this issue in the past I tried opening with w+, and just w on the temp file a typical text line would be some fieldname = some value the scipt cycles through the file ignoring comments that are "#" ps the tmp file will overwrite the origianl once complete all i really want to know is WHY i cant get the newline to work, and what is the suggested fix EDIT: i just tried PHP_EOL and it still appends another newline Hi, I am writing several scripts and some are used to amend extra information to a text file. However, I added a hyperlink to the text file so that the user can go back to a page where they can add extra information. However, since I have done this every time I amend more text to the text file, the extra text appears below the hyperlink rather than above it, and I was wondering if there was a way around this. My amend code is as follows: Code: [Select] <html> <head> <title>Amend File</title> <link rel="stylesheet" type="text/css" a href="rcm/stylesheet.css"> </head> <?php if($_POST['append'] !=null) { $filename="C:/xampp/htdocs/rcm/denman2.txt"; $file=fopen($filename, "a"); $msg="<p>Updated Information: " .$_POST['append']. "</p><br>"; fputs ($file, $msg); fclose($file); } ?> <body> <h1>Do you want to append to a document?</h1> Enter Updated Information: <form action="amendfile2.php" method="post"> <input type="text" size="40" name="append"><br><br> <input type="submit" value="Add updated information to report"> </form> <form action="viewfile3.php" method="post"> <input type="submit" size="40" value="View Web Blog"> </form> <form action="loginform.php" method="post"> <input type="submit" value="Click here to go to the Log In Screen"> </form> </body></html> And my text file is as follows: Code: [Select] <h1>Accident Report</h1> <p>First Name: Andrew Last Name: Denman Age: 18 Complete Weeks Since Accident: 2<br> <a href="amendfile2.php">Amend to this file</a> Any help would be appreciated |