PHP - Extracting Text From Different File Formats
Dear all,
is there any library that supports text extraction from docx,doc, excel, pdf, etc formats like Apache POI does on Java? Or should I port Apache POI classes to PHP code? best regards, ethereal1m Similar TutorialsI have a large text file that I need to search and extract text from. I have some code that somewhat works but is not good for what I need because it only reads one line at a time. I need to be able to echo all code between two strings and continue scanning the entire document. I am attaching the TXT file that is being read by the script: Here is the script: Code: [Select] <? $searchthis = "Problem:"; $search="Check:"; $matches = array(); $handle = @fopen("1numbers.txt", "r")or die("can't open file"); if ($handle) { while (!feof($handle)) { $buffer = fgets($handle); if(strpos($buffer, $searchthis) !== FALSE) echo "<br>". $buffer."<br>"; if(strpos($buffer, $search) !== FALSE) echo "<br>". $buffer."<br>"; } fclose($handle); } ?> you can see what this script outputs by visiting this link: http://yourautofix.com/data/data.php but my problem is it only outputs one line of text that finds the search match. I need it to output all lines of text between two matches for example any text between "Problem:" and "Check:" should be Echo'd and any text between "Check:" and "Likely:" should be echo'd there may be 1 line or 20 lines of text between the tags... I need to print all lines between the 2 determined search strings and then continue through the text file displaying all matches between the search strings in a large file. any thoughts on how I can get this done or point me in the right direction? Thanks for any input on this Paul I have html files in which, there are lines of urls starting with http:// (simple text, not hyperlink) without a tag. What is the simplest way to extract them? Hi I'm learning php and trying to write a script to extract registration information from a large text file. Sadly my meagre knowledge of php is letting me down a bit. It's a case of knowing what you want the script to do but not having the knowlege of how to 'say it'. So i was hoping that if I posted my code here someone could either give me a few pointers on where i am going wrong or suggest a better way. The text file data luckily has a recurring format as follows (for brevity i've only included one entry, which contains made up information): From: bella_done@yahoo.co.uk Sent: 02 February 2011 22:50 To: Jonny tum, patsy fells, dingly bongo Subject: Subject: Fun Run 2010 Categories: Fun Run Name: Bella Donna Address: 14 brondle avenue Postcode: cd83 1rg Phone: 0287343510 Email: bella_don@yahoo.co.uk DOB: 15/11/1945 Half or Full: Full fun run How did you hear: Took part in 2010 As you can see the data has a convenient boundary at the 'from' field and the colon (or so it occurred to me) so I created my script as follows: // the string being analysed $the_string = " From: bella_done@yahoo.co.uk Sent: 02 February 2011 22:50 To: Jonny tum, patsy fells, dingly bongo Subject: Subject: Fun Run 2010 Categories: Fun Run Name: Bella Donna Address: 14 brondle avenue Postcode: cd83 1rg Phone: 0287343510 Email: bella_don@yahoo.co.uk DOB: 15/11/1945 Half or Full: Full fun run How did you hear: Took part in 2010"; // remove all formatting to work with a clean string $clean_string = strip_tags($the_string); // remove form field entries from the data and replace with commas and a ZZZ boundary $remove_fields = array("Categories:" => "","Name:" => ",","Address:" => ",","Postcode:" => ",","Phone:" => ",","Email:" => ",","DOB:" => ",","Half or Full:" => ",","How did you hear:" => ",","From:" => "ZZZ","Sent:" => ",","To:" => ",", ); $new_string = strtr("$clean_string",$remove_fields); // split the data at the boundary ZZZ $string_to_array = explode("ZZZ", $new_string); $new_string2 = implode("</br>",$string_to_array); echo $new_string2; $myFile = "address_list.csv"; $fh = fopen($myFile, 'w') or die("can't open file"); $stringData = $new_string2; fwrite($fh, $stringData); fclose($fh); One major problem is when i write the new data to a csv file the csv contains spacings that cause it to be reproduced in a column form rather than as separate fields for each comma boundary. So can anyone suggest either a) a better way of extracting the data from the text file (doesn't need to be 100% clean and perfect) b) How can i stop the spaces in the csv (i thought i would have fixed this when i stripped the tags from the string at the start??). Any help would be greatly received by a newbie phper. It's my first shot at performing anything moderately taxing so if I've made some blaring oversites I would very much welcome your wisdom! Thank you Drongo I am looking at using PHP (or whatever) to compress images/video/music when it is uploaded to be stored on the server and decompressed when required on the webpage. I know I will have to make a decompress function so it knows when to decompress to its full size, but any ideas on how to do the compression. Requirements: biggest, best and efficient compression of music/image/video files. Thanks in advance. Hi! I have .swf files that have images in them. I can view them with swf decompiler on my computer but i need to have this functionality on my website. Is there any way to extract images from .swf file with php? I'm normally fairly proficient with PHP, but I haven't done any coding in quite a while, so I'm a little rusty. I have an entire page of text from which I need to extract a single value. Here is a small portion of the page in question: Code: [Select] Total Rank: 128 Total Points: 4,978 Next Rank: 20 For instance, I need to extract the values "128" "4978" and "20" and store them in variables. These values change all the time, so I'm not sure what the best way to go about this is... maybe a regular expression ? If that's the case, I've never been too good with them, so any help would be appreciated. Hi there, In my attached PHP script, I extract text between two strings in the input file and write the extracted text to an output file. Everything seems to work fine, except I can't figure out how to include the row that says "Richland" (after the row that says "Creighton") in the extracted text. If someone could guide me how to do this, I'd greatly appreciate it. The PHP script is attached. The input file is in htm format and I can't attach that here so I will provide a link to the file I'm calling: http://www.afws.net/data/pa/savedata/109/06/2009060920.pa.htm Many thanks!!! I have managed to get this to work but it seems like it is a very long and messy solution. I was wondering if anyone had an idea of how this can be done better. I am new to php and don't know a lot. It shows the text between the tags <h1> and </h1> from the content of a different file Basically I had to start the substr() from the fourth position so it would actually skip the "<h1>" being included, and because I started on the fourth postion I then had to finish four places back to skip the "</h1>" being included. Code: [Select] <?php $id = $_GET['id']; $homepage = file_get_contents("./".$id.".php"); $title = stristr($homepage,"<h1>"); $titlepos = strpos($homepage,"</h1>"); $endpos = $titlepos - 4; echo "Title " . substr($title,4,$endpos); ?> Folks, I tired all my PHP skills to extract domain name strings from a RSS Feed and put each domain name as an Array element, but all in vain: Here is the RSS: http://bulliesatwork.co.uk/master/dev/domp/expdom/domains.php() What i want to extract: Quote Do you see a list of domain names, which are Anchored, all i need is to extract these domain names llik "abc.co uk" (observe there is a space between .co and uk, which can be removed with str_replace()) Here is my first try: (Using SimpleHTMLDomParser) Code: [Select] require_once('simple_html_dom.php'); $html = file_get_html('http://bulliesatwork.co.uk/master/dev/domp/expdom/domains.php'); $domains = $html->find('div[class="entry"] a', 0); foreach($domains as $dom) { echo str_replace(' ', '.', $dom->plaintext); } $html->clear(); unset($html); Here is my another try with DOM Document: Code: [Select] $scrapeurl = 'http://bulliesatwork.co.uk/master/dev/domp/expdom/domains.php'; $keywords = file_get_contents($scrapeurl); $keywords = json_decode($keywords); foreach( $keywords->responseData->results as $keyword) { echo str_replace("...",".",$keyword->title).'<br/>'; } In both the cases, DOM document is created but it seems the Document has all information except the Domain names i want to extract. Please help me out to extract the doamin names. Cheers I have a bunch of pdf's and I want to extract text from the last page of every pdf. I have a function to count the number of pages in each pdf. Does anyone know of a way that I could extract file from a specified file and page number. example: getData('example.pdf', 54); I m printing the prices of the items in my db. I want to print the prices like ; Price --- The format I want 75.000 --- 75 75.500 --- 75.5 100 --- 100 100.5 --- 100.5 1234.654 --- 1,234.654 1234.200 --- 1,234.2 1123456.789 --- 1,123,456.789 Simply I want to cancel the zeros after dot, and want to put comma every 3 digits before the dot. How can I do this ? Hi all I hope I have come to the right place. My system reads files stored on a drive and lists them to users through plain HTML, I made it 11 years ago and have to refresh my memory. My problem is that filenames seem to come in different formats, to how to decode/encode them is an issue...
My users use Scandinavian letters (æøåäöüõ) and it seems like one filename is in one format and another in another format. There is no logic to what format the filenames comes it. All files are ok and downloads as they should, they just dont list well. Any idea how I can handle this issue?
Ok i have been working on this for a day+ now. here is my delema simple .ini text file. when a user makes a change (via html form) it makes the correct adjustments. problem is the newline issue 1. if i put a "\n" at the end (when using fputs) works great, except everytime they edit the file it keeps adding a new line (i.e. 10 edits there are now 10 blank lines!!!!) 2. if i leave off the "\n" it appends the next "fgets" to that lilne making a mess Code: [Select] ##-- Loop thruoght the ORIGINAL file while( ! feof($old)) { ##-- Get a line of text $aline = fgets($old); ##-- We only need to check for "=" if(strpos($aline,"=") > 0 ) { ##-- Write NEW data to tmp file fputs($tmp,$info[$i]." = ".$rslt[$i]."\n"); $i++; } ##-- No Match else { fputs($tmp,$aline."\n"); }//Checking for match }//while eof(old) what in the world is making this such a big deal. i dont remember having this issue in the past I tried opening with w+, and just w on the temp file a typical text line would be some fieldname = some value the scipt cycles through the file ignoring comments that are "#" ps the tmp file will overwrite the origianl once complete all i really want to know is WHY i cant get the newline to work, and what is the suggested fix EDIT: i just tried PHP_EOL and it still appends another newline I currently am working on a project where I code a "simple" telephone directory. There are three main tasks that it needs to do: 1. Directory.php(index page) has a "First Name" and "Last Name" field and a search button. When a name is searched from the directory.txt file, it displays First Name, Last Name, Address, City, State, Zip and phone in findinfo.php in designated text boxes...first name, last name, etc. 2. From the findinfo.php, like previously stated, the users information is listed in the appropriate text boxes. From there, there is an update button that will overwrite the user's information to directory.txt if that button is selected. It will then say the write was sucessful. 3. (completed this step) From the index page, there is a link that will take you to addnew.php where you enter First Name, Last Name, Address, City, State, Zip and phone in a web form and write it to directory.txt. This is the php code for the third step: <?php $newentryfile = fopen("directory.txt", "a+"); $firstname = $_POST['fname']; $lastname = $_POST['lname']; $address = $_POST['address']; $city = $_POST['city']; $state = $_POST['state']; $zip = $_POST['zip']; $phone = $_POST['phone']; $newentry = "$firstname $lastname\n\r $address\n\r $city, $state $zip\n\r $phone\n\r"; if (flock($newentryfile, LOCK_EX)) { if (fwrite($newentryfile, $newentry) > 0) echo "<p>" . stripslashes($firstname) . " " . stripslashes($lastname) . " has been added to the directory.</p>"; else echo "<p>Registration error!</p>"; flock($newentryfile, LOCK_UN); } else echo "<p>Cannot write to the file. Please try again later</p>"; fclose($newentryfile); if(empty($firstname) || empty($lastname) || empty($address) || empty($city) || empty ($state) || empty($zip) || empty($phone)) { echo "<p>Please go back and fill out all fields.</p>"; } ?> So to sum it all up, what would be my best approach? I am totally stumped and not sure which function to use. Should I work my way from step 1 to step 2? I see it as when I do the search for the name from directory.php, it takes me to findinfo.php, listing the users information in the text boxes. From there, if I needed to, having the user's information already listed I could hit the update button to overwrite the new information to directory.txt. Doing the update when then tell me that the write was successful. I have literally been scouring the internet for hours. What would be the best function to do this? I hope I was clear enough. Please help me out and thank you for your time. Hi, I am writing several scripts and some are used to amend extra information to a text file. However, I added a hyperlink to the text file so that the user can go back to a page where they can add extra information. However, since I have done this every time I amend more text to the text file, the extra text appears below the hyperlink rather than above it, and I was wondering if there was a way around this. My amend code is as follows: Code: [Select] <html> <head> <title>Amend File</title> <link rel="stylesheet" type="text/css" a href="rcm/stylesheet.css"> </head> <?php if($_POST['append'] !=null) { $filename="C:/xampp/htdocs/rcm/denman2.txt"; $file=fopen($filename, "a"); $msg="<p>Updated Information: " .$_POST['append']. "</p><br>"; fputs ($file, $msg); fclose($file); } ?> <body> <h1>Do you want to append to a document?</h1> Enter Updated Information: <form action="amendfile2.php" method="post"> <input type="text" size="40" name="append"><br><br> <input type="submit" value="Add updated information to report"> </form> <form action="viewfile3.php" method="post"> <input type="submit" size="40" value="View Web Blog"> </form> <form action="loginform.php" method="post"> <input type="submit" value="Click here to go to the Log In Screen"> </form> </body></html> And my text file is as follows: Code: [Select] <h1>Accident Report</h1> <p>First Name: Andrew Last Name: Denman Age: 18 Complete Weeks Since Accident: 2<br> <a href="amendfile2.php">Amend to this file</a> Any help would be appreciated Hello, i am currently getting an Microsoft Excel formatted text file whose save type is .Txt from a URL.I used to open it and will change the save type as excel file. Please suggest whether we can do this with php code. currently my code is like this, <? php copy("http://www.faa.gov/airports/airport_safety/airportdata_5010/menu/emergencyplanexport.cfm?Region=&District=&State=&County=&City=LAS%20VEGAS&Use=&Certification=","./contactsexport.xls"); ?> where as the contactsexport.xls type is .Txt which i need it in .xls Thanks in Advance. Folks, I want to extract certain portion form URL. Exmaple: Quote http://abc.com/this-is-test.html Output should be Quote this-is-test Another Example Quote http://abc.com/this-is-yet-another-test.html Output should be Quote this-is-yet-another-test I am not sure how it can be done with preg_match() and regex or something like that... Can someone help me with this please? Cheers Natasha Ok, I have a log file that is written to. I only want the text file to have the last fifty lines (\n) delimited So, how do I make it delete the excess from the TOP of the file? So if it had 60 lines, I would want it to only have lines 10-60 to net 50 total. Thanks. I am trying to add a txt file to a mysql database. Is there any one that can help? Here are the details TEST.TXT contains the following A001200910019999999900000200000000000000000000000 00000000000000000000000000 A002200910019999999900000610000000000000000000000 00000000000000000000000000 A003200910019999999900000687500000000000000000000 00000000000000000000000000 A004200910019999999900000335000000000000000000000 00000000000000000000000000 A005200910019999999900000626500000000000000000000 00000000000000000000000000 A006200704019999999900000423500000000000000000000 00000000000000000000000000 A007200910019999999900000323500000000000000000000 00000000000000000000000000 A008200704019999999900000102500000000000000000000 00000000000000000000000000 And so on up to 200 rows long I need to split each row the same way and add them to a table I will use the first row as an example Character 1 to 4 (A001) entered into a field called name Character 5 to 12 (20091001) entered into a field called efdate Character 13 to 20 (99999999) entered into a field called exdate Character 21 to 31 (00000200000) entered into a field called prfee Character 32 to 42 (00000000000) entered into a field called assfee Character 43 to 53(00000000000) entered into a field called spfee Character 54 to 64 (00000000000) entered into a field called anfee Character 65 to 75 (00000000000) entered into a field called nanfee Thanks in advance for the help |