PHP - Extracting Text Between Two Strings
Hi there,
In my attached PHP script, I extract text between two strings in the input file and write the extracted text to an output file. Everything seems to work fine, except I can't figure out how to include the row that says "Richland" (after the row that says "Creighton") in the extracted text. If someone could guide me how to do this, I'd greatly appreciate it. The PHP script is attached. The input file is in htm format and I can't attach that here so I will provide a link to the file I'm calling: http://www.afws.net/data/pa/savedata/109/06/2009060920.pa.htm Many thanks!!! Similar TutorialsI'm trying to work on my php skills by creating an online content "finder". I'm using cURL to grab webpage HTML and store it in a variable. I then want my program to find the article title, keywords and the content. I'm using articles from Ezine. here is an example webpage. Code: [Select] http://ezinearticles.com/?8-Ways-to-Reduce-Stock-Investment-Risks&id=954266 I don't have any problems grabbing the webpage html. But when I try to grab the title, keywords, and content using this code. Code: [Select] $title = getbetween($html, "<title>", "</title>"); $keywords = getbetween($html, "<meta name=\"keywords\" content=\"", "\">"); $content = getbetween($html, "<div id=\"body\">", "</div>"); $content = "<P>" . getbetween($content . "<E>", "<P>", "<E>"); echo "Title: " . $title . "<P>"; echo "Keywords: " . $keywords . "<P>"; echo "Content: " . $content . "<P>"; Only the title comes out of the page, the rest of the values are blank... I'm not sure why the getbetween is only working to extract the title from the html. Does anyone have any ideas why?? I tried testing to see if it was grabbing the propper webpage and when I used 'echo $html;' the webpage came out fine on my page.. I am totally lost on this, I just want to practice using cURL and my webrequest coding. Thanks for any help, I appreciate it. Folks, I tired all my PHP skills to extract domain name strings from a RSS Feed and put each domain name as an Array element, but all in vain: Here is the RSS: http://bulliesatwork.co.uk/master/dev/domp/expdom/domains.php() What i want to extract: Quote Do you see a list of domain names, which are Anchored, all i need is to extract these domain names llik "abc.co uk" (observe there is a space between .co and uk, which can be removed with str_replace()) Here is my first try: (Using SimpleHTMLDomParser) Code: [Select] require_once('simple_html_dom.php'); $html = file_get_html('http://bulliesatwork.co.uk/master/dev/domp/expdom/domains.php'); $domains = $html->find('div[class="entry"] a', 0); foreach($domains as $dom) { echo str_replace(' ', '.', $dom->plaintext); } $html->clear(); unset($html); Here is my another try with DOM Document: Code: [Select] $scrapeurl = 'http://bulliesatwork.co.uk/master/dev/domp/expdom/domains.php'; $keywords = file_get_contents($scrapeurl); $keywords = json_decode($keywords); foreach( $keywords->responseData->results as $keyword) { echo str_replace("...",".",$keyword->title).'<br/>'; } In both the cases, DOM document is created but it seems the Document has all information except the Domain names i want to extract. Please help me out to extract the doamin names. Cheers I'm normally fairly proficient with PHP, but I haven't done any coding in quite a while, so I'm a little rusty. I have an entire page of text from which I need to extract a single value. Here is a small portion of the page in question: Code: [Select] Total Rank: 128 Total Points: 4,978 Next Rank: 20 For instance, I need to extract the values "128" "4978" and "20" and store them in variables. These values change all the time, so I'm not sure what the best way to go about this is... maybe a regular expression ? If that's the case, I've never been too good with them, so any help would be appreciated. I have managed to get this to work but it seems like it is a very long and messy solution. I was wondering if anyone had an idea of how this can be done better. I am new to php and don't know a lot. It shows the text between the tags <h1> and </h1> from the content of a different file Basically I had to start the substr() from the fourth position so it would actually skip the "<h1>" being included, and because I started on the fourth postion I then had to finish four places back to skip the "</h1>" being included. Code: [Select] <?php $id = $_GET['id']; $homepage = file_get_contents("./".$id.".php"); $title = stristr($homepage,"<h1>"); $titlepos = strpos($homepage,"</h1>"); $endpos = $titlepos - 4; echo "Title " . substr($title,4,$endpos); ?> I have a large text file that I need to search and extract text from. I have some code that somewhat works but is not good for what I need because it only reads one line at a time. I need to be able to echo all code between two strings and continue scanning the entire document. I am attaching the TXT file that is being read by the script: Here is the script: Code: [Select] <? $searchthis = "Problem:"; $search="Check:"; $matches = array(); $handle = @fopen("1numbers.txt", "r")or die("can't open file"); if ($handle) { while (!feof($handle)) { $buffer = fgets($handle); if(strpos($buffer, $searchthis) !== FALSE) echo "<br>". $buffer."<br>"; if(strpos($buffer, $search) !== FALSE) echo "<br>". $buffer."<br>"; } fclose($handle); } ?> you can see what this script outputs by visiting this link: http://yourautofix.com/data/data.php but my problem is it only outputs one line of text that finds the search match. I need it to output all lines of text between two matches for example any text between "Problem:" and "Check:" should be Echo'd and any text between "Check:" and "Likely:" should be echo'd there may be 1 line or 20 lines of text between the tags... I need to print all lines between the 2 determined search strings and then continue through the text file displaying all matches between the search strings in a large file. any thoughts on how I can get this done or point me in the right direction? Thanks for any input on this Paul Dear all, is there any library that supports text extraction from docx,doc, excel, pdf, etc formats like Apache POI does on Java? Or should I port Apache POI classes to PHP code? best regards, ethereal1m I am trying to make a little script that allows a user to search for blocks of text within strings. The user enters data into form fields and he or she can enter text into another form field (needle) to search the data fields (haystack). When the search string matches something in the data fields the associated data fields are highlighted in a yellow background color. Right now the search string is acting funny. When I enter a search string I get no highlighting unless if the first character(s) of the search string are the same as the first character(s) for the items. For instance, If I search for the text "at" in the word "bat" I will not get any yellow highlighting. But I would get highlighting for "bat" if I search for "ba." How would I change the code so that any data field is highlighted if the search string exists anywhere in the text for the data field? Also, I figured out how to stop the form fields from being yellow if they and the search field are empty/NULL, but I did this part in another file (as an IF statement) and can't seem to get it to work in the other file. How would I make it do the highlighting if and only if there is a search string in the search field (i. e. only highlighting when the search field is not NULL/empty). The code from my 2 files is here...: http://pastie.org/1095526 , http://pastie.org/1095528 Thanks very much to anyone who can help me. I have html files in which, there are lines of urls starting with http:// (simple text, not hyperlink) without a tag. What is the simplest way to extract them? Hi I'm learning php and trying to write a script to extract registration information from a large text file. Sadly my meagre knowledge of php is letting me down a bit. It's a case of knowing what you want the script to do but not having the knowlege of how to 'say it'. So i was hoping that if I posted my code here someone could either give me a few pointers on where i am going wrong or suggest a better way. The text file data luckily has a recurring format as follows (for brevity i've only included one entry, which contains made up information): From: bella_done@yahoo.co.uk Sent: 02 February 2011 22:50 To: Jonny tum, patsy fells, dingly bongo Subject: Subject: Fun Run 2010 Categories: Fun Run Name: Bella Donna Address: 14 brondle avenue Postcode: cd83 1rg Phone: 0287343510 Email: bella_don@yahoo.co.uk DOB: 15/11/1945 Half or Full: Full fun run How did you hear: Took part in 2010 As you can see the data has a convenient boundary at the 'from' field and the colon (or so it occurred to me) so I created my script as follows: // the string being analysed $the_string = " From: bella_done@yahoo.co.uk Sent: 02 February 2011 22:50 To: Jonny tum, patsy fells, dingly bongo Subject: Subject: Fun Run 2010 Categories: Fun Run Name: Bella Donna Address: 14 brondle avenue Postcode: cd83 1rg Phone: 0287343510 Email: bella_don@yahoo.co.uk DOB: 15/11/1945 Half or Full: Full fun run How did you hear: Took part in 2010"; // remove all formatting to work with a clean string $clean_string = strip_tags($the_string); // remove form field entries from the data and replace with commas and a ZZZ boundary $remove_fields = array("Categories:" => "","Name:" => ",","Address:" => ",","Postcode:" => ",","Phone:" => ",","Email:" => ",","DOB:" => ",","Half or Full:" => ",","How did you hear:" => ",","From:" => "ZZZ","Sent:" => ",","To:" => ",", ); $new_string = strtr("$clean_string",$remove_fields); // split the data at the boundary ZZZ $string_to_array = explode("ZZZ", $new_string); $new_string2 = implode("</br>",$string_to_array); echo $new_string2; $myFile = "address_list.csv"; $fh = fopen($myFile, 'w') or die("can't open file"); $stringData = $new_string2; fwrite($fh, $stringData); fclose($fh); One major problem is when i write the new data to a csv file the csv contains spacings that cause it to be reproduced in a column form rather than as separate fields for each comma boundary. So can anyone suggest either a) a better way of extracting the data from the text file (doesn't need to be 100% clean and perfect) b) How can i stop the spaces in the csv (i thought i would have fixed this when i stripped the tags from the string at the start??). Any help would be greatly received by a newbie phper. It's my first shot at performing anything moderately taxing so if I've made some blaring oversites I would very much welcome your wisdom! Thank you Drongo This topic has been moved to PHP Regex. http://www.phpfreaks.com/forums/index.php?topic=306874.0 Folks, I want to extract certain portion form URL. Exmaple: Quote http://abc.com/this-is-test.html Output should be Quote this-is-test Another Example Quote http://abc.com/this-is-yet-another-test.html Output should be Quote this-is-yet-another-test I am not sure how it can be done with preg_match() and regex or something like that... Can someone help me with this please? Cheers Natasha $example "Q: Example? A:" Let's assume that I'd like to display $example but exclude "Q:" and "A:", how would I do that? I'm *attempting* to write a script that will take a paste of data from a user and when it's submitted it will drop certain parts of the data into databases and be available for recall later. I have the MySQL sorta out I think but I can't test until I get this part done. Basically people will paste a copied paste (CTRL A, CTRL C and go to my form and just hit CTRL V). It will contain a bunch of data I want to drop in the database but in different areas, like the following example the items in bold are the items I want to grab. First Name: Lincoln Last Name: Coe Address: 1234 Easy St, Perfectville, PV 00000 Telephone: 0000000000 HI guys, I'm quite new to php, and I'm really struggling to get this right. I just cant get it to work properly. Im trying to extract email from a list of URLS. I have currently got it to work with 1 URL at a time, but I am needing to as to how I can pass multiple URLS at once, either from a csv or just pasting them into the input. any help would be much appriciated. here is my current code: Quote <?php $the_url = isset($_REQUEST['url']) ? htmlspecialchars($_REQUEST['url']) : ''; ?> <form method="post"> Please enter full URL of the page to parse (including http://):<br /> <textarea name="url" cols="100" rows="10"><?php echo $the_url; ?></textarea> <br /> <input type="submit" value="Get Emails" /> </form> <?php if (isset($_REQUEST['url']) && !empty($_REQUEST['url'])) { // fetch data from specified url $text = file_get_contents($_REQUEST['url']); } elseif (isset($_REQUEST['text']) && !empty($_REQUEST['text'])) { // get text from text area $text = $_REQUEST['text']; } // parse emails if (!empty($text)) { $res = preg_match_all( "/[a-z0-9]+([_\\.-][a-z0-9]+)*@([a-z0-9]+([\.-][a-z0-9]+)*)+\\.[a-z]{2,}/i", $text, $matches ); if ($res) { foreach(array_unique($matches[0]) as $email) { echo $email . "<br />"; } } else { echo "No emails found."; } } ?> I usually only do design work, but a client wanted a log in system to his website, so I decided to do it. I set everything up correctly, and users can sign up, login, and log out. However, he wants to be informed when a user logs into his site. So say (user x) wants logs in, my client wants to receive an email with all of (user x)'s account information. How do I pull a row from mysql based on the login information provided? My database is has 7 columns: user, pass, name, address, state, phone, email I have an RSS feed, with alot of data of lottery numbers. the feed itself is : http://www.alllotto.com/rss/NY/latest.rss The item i am interested in is: Quote <item> <title>Take 5</title> <description>2011-08-27: 4, 10, 16, 17, 31</description> <guid isPermaLink="false">USA-New-York-Take-5-2011-08-27</guid> </item> How could I output those to varibles? thanks Joe Hey guys i have a array i made and im wanting to grab a random value out of the array. The only way i can see to do this looking at the list of array functions is i need to random a key first? and then return the value assigned to the random key i just generated? I only see a function that lets me do the opposite. array_search() so now that i have generated a random key how do I grab just the value from the array using the random key. or is there a function i can just random value? $explode_names = explode(" ", $planet_names); $random_key_name = (array_rand($explode_names, 1)); echo $random_key_name; hey guys i need to extract all the IP's of a string and loop them for more operations but for some reason i only get the first one <?php $string = '80.37.14.13 80.37.14.14 80.37.14.15'; preg_match("/\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}/", $string, $matches); foreach ($matches as $ip){ echo "$ip<br>"; } ?> The string is not really seperated by spaces ... it can actualy be messy and have ips rapped arround a lot of code. The regex works because i do get the first one ... What did i miss? Hi there I need to extract data from some XML. I have found a few sites that explain that part to me, however there is a section of the data which I need to extract and am not sure how to go about it. Below is an extract of the XML, the section I am trying to extract is highlighted in red (it is essentially a text message being sent I am trying to get the contents of the sms/text message): <usa_smpp_host>THTTP</usa_smpp_host> <short_message>id:660352946 sub:001 dlvrd:001 submit date:1102081032 done date:1102081035 stat:DELIVRD err:000 text:Test SMS message</short_message> <dest_addr_npi>1</dest_addr_npi> Any assistance would be appreciated. Have played around quite a bit but have not managed to figure it out yet. thanks |