PHP - Parsing Html To Strip Required Data
So I have an interesting one for you guys this AM,
I first want to make it very clear that I am not scraping code, rather I am scraping data that is needed to import into a shopping cart system for someone. I have a URL that I am trying to scrape required data off of, however it is not returning all the data that I want. I have created a function that uses preg_match_all() and regex and I am still having issues striping what I want. here is a link to my test what I am wanting to strip from http://visualrealityink.com/dev/clients/rug_src/scrapeing/Rugsource/www.vendio.com/stores/Rugsource1/item/other/tribal-wool-3x5-shiraz-persian/lid=10363581.html I am wanting to grab all this data: Quote Item Number: K-686 Style : Shiraz Province : Fars Made In : Iran Foundation : Wool Pile : 100% Wool Colors : Red, Navy Blue, Ivory, Forest Green, Light Blue, Orange Size (feet) : 4' 11" x 3' 4" Size (Centimeter) : 155 x 103 Age : 20-25 Years Old Condition : Very Good KPSI (knots per sq. inch) : 130 knots per square inch Woven : Hand Knotted Shipping and Handling : Free Shipping(For Mainland USA) Est. Retail Value : $2,700.00 Here is the code note that $url holds the link above. Code: [Select] $html = file_get_contents($url); $newlines = array("\t","\n","\r","\x20\x20","\0","\x0B"); $html = str_replace($newlinews, "", html_entity_decode($html)); preg_match_all('/<tr><td width="50%" align="right"><font color="#800000"><b>[^\s ](.*?)<\/b><\/font><\/td><td width="50%" align="left">[^\s ](.*?)<\/td><\/tr>/', $html, $matches, PREG_SET_ORDER); foreach($matches_label as $match){ $count = 0; echo $match[$count]; echo "<br>"; $count++; } echo $count; This returns the following Quote Style : Shiraz Province : Fars Foundation : Wool Colors : Red, Navy Blue, Ivory, Forest Green, Light Blue, Orange Size (feet) : 4' 11" x 3' 4" Size (Centimeter) : 155 x 103 Age : 20-25 Years Old Condition : Very Good Est. Retail Value : $2,700.00 1 it is missing: Quote Inventory Number : xxxxxxx Made In: xxxxxxxx Pile : xxxxxxxxxx KPSI(Knots Per Inch) : xxxxxxxxxx Woven : xxxxxxxxx Shopping : xxxxxxxxxxx You can see the script in action here -> http://visualrealityink.com/dev/clients/rug_src/scrapeing/scrape_tst.php Thanks in advance for all of your help Similar TutorialsI have been searching on google for a while, but i couldn't find it. So i thought may be you could direct me to some tutorial or steps if you knew. Basically, i am working on a articles directory and the big text area where the main article will be entered, i want to allow all the links (link tag) on it but not any other html tags. Currently i am using strip tags and so its cutting down the tags and all the links are being displayed naked on it. So can you please tell me how do i do it? Thanks.. Hi... I am working to parse an opf file. I can pull the dc children but am having problems getting it to pull other information from the file. Is there a way to do this? What I am currently doing: $package = simplexml_load_file("$url"); echo $package->metadata->children('dc', true)->creator."<br>"; echo $package->metadata->children('dc', true)->title."<br>"; echo $package->metadata->children('dc', true)->description."<br>"; Is there a way to parse the meta content by name? File structu <?xml version='1.0' encoding='utf-8'?> <package xmlns="http://www.idpf.org/2007/opf" unique-identifier="uuid_id" version="2.0"> <metadata xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:opf="http://www.idpf.org/2007/opf"> <dc:title>The Beginning of the End</dc:title> <dc:creator opf:file-as="Cassidy, Amanda" opf:role="aut">Amanda Cassidy</dc:creator> <dc:description>Working with young children can be a very rewarding job... and extremely frustrating at the...</dc:description> <meta content="Preschool Teachers Anonymous" name="library:series"/> <meta content="1" name="library:series_index"/> </metadata> </package>
Hi,
I have a string that looks like this:
{
"cmd": "VarReturn",
"name": "temperature",
"result": 947,
"coreInfo": {
"last_app": "",
"last_heard": "2014-07-14T11:46:17.865Z",
"connected": true,
"deviceID": "234y8172390dfsa"
}
}
which I have fetched from a web page using file_get_contents();
How do I put the data into variables? Either a variable for each piece of data eg $cmd = "VarReturn", $name="temperature" or into an array?
At the moment I'm doing it the very messy way of using strpos() to look for each section, but I'm fairly sure there's a much easier way (using regular expressions?) but I'm a bit stuck on where to start.
Any help would be much appreciated.
Hi there everyone... Hiya, I was cruising along with SOAP nicely until I needed to parse more complex xml in a soapcall. I am using non-wdsl if that makes any difference. I need to create XML in the following structu Code: [Select] <Body> <SOAP_updateCreateContactMarketingRule> <Contact_ID xsi:type="xsd:int">1</Contact_ID> <MarketingRule_ID xsi:type="xsd:int">1</MarketingRule_ID> <MarketingQuestions> <MarketingQuestion> <MarketingQuestion_ID xsi:type="xsd:int">1</MarketingQuestion_ID> <Answer xsi:type="xsd:string">No</Answer> </MarketingQuestion> <MarketingQuestion> <MarketingQuestion_ID xsi:type="xsd:int">2</MarketingQuestion_ID> <Answer xsi:type="xsd:string">Yes</Answer> </MarketingQuestion> </MarketingQuestions> </SOAP_updateCreateContactMarketingRule> </Body> So, I set my parameters up as follows: $params[] = new SoapParam("1", "Contact_ID"); $params[] = new SoapParam("1", "MarketingRule_ID"); $params[] = new SoapParam(array("MarketingQuestion"=>array("MarketingQuestion_ID"=>1, "Answer"=>"Yes")), "MarketingQuestions"); And I am getting the following XML generated when I use __getLastRequest() Code: [Select] <SOAP-ENV:Body> <ns1:SOAP_updateCreateContactMarketingRule> <Contact_ID xsi:type="xsd:string">1</Contact_ID> <MarketingRule_ID xsi:type="xsd:string">1</MarketingRule_ID> <MarketingQuestions SOAP-ENC:arrayType="ns2:Map[1]" xsi:type="SOAP-ENC:Array"> <item xsi:type="ns2:Map"> <item> <key xsi:type="xsd:string">MarketingQuestion_ID</key> <value xsi:type="xsd:int">8</value> </item> <item> <key xsi:type="xsd:string">Answer</key> <value xsi:type="xsd:string">Yes</value> </item> </item> </MarketingQuestions> </ns1:SOAP_updateCreateContactMarketingRule> </SOAP-ENV:Body> Clearly my third parameter is not set up correctly - can anyone help with this? Thanks I think what I'd like to do is very simple, but I just can't figure out how to accomplish the task. Fair warning -- I'm pretty new to PHP and any help is appreciated. I have a form that takes in multiple fields of data and stores it in a mysql database. I'm trying to output the data to another page on my website and I'm able to do that, but I'm really looking to output it in a way that's easily formatted using CSS. This is on a wordpress site and I'm using the following code to output my data: Code: [Select] <?php $result = $wpdb->get_results ("SELECT field_val FROM wp_cformsdata WHERE field_name <> 'page' OR 'Fieldset1'", OBJECT); foreach ($result as $teaminfo) { echo $teaminfo->field_val . "<br/>"; } ?> Obviously, this just outputs the data with a <br> after each field value. I'd like to see if there's a way to individually echo each field, such as "echo $teaminfo->name;" or "echo $teaminfo->address;" so that I can wrap each echo within a CSS class. Alternatively, would there be a way to echo the field_val wrapped around <span class="field_name"></span> ? This would also suit my needs, but I'm not sure how to accomplish it. Also, unfortunately I don't have remote access to mysql with my hosting company, but I've included the output from myphpadmin when I run describe wp_cformsdata; : Thank you in advance for your help. good day dear phpfreaks.
I am new to PHP's SimpleXML. i want to work with SimpleXML on OSM-files. The original version of this question was derived from he OSM Data parsing to get the nodes with child https://stackoverflow.com/questions/16129184/osm-data-parsing-to-get-the-nodes-with-child
I am thankful that hakre offered a great example in the comments that makes a overwhelming goal: how to get more out of it: I want to filter the data to get the nodes with special category. Here is sample of the OSM data I want to get the whole schools within an area. The first script runs well - but now I want to refine the search and add more tags. Finally I want to store all into MySQL. So we need to make some XML parsing with PHP:
The following is a little OSM Overp
Quote
since i am learning - i break down the code into pieces...For my question, the second part is more interesting here. That is querying the XML data we have already. Again - as mentioned above: This is most easily done with xpath, the used PHP XML library is based on libxml which supports XPath 1.0 which covers the various querying needs very well. The following example lists all schools and tries to obtain their names as well.
# get all school nodes with xpath //node[tag[@k = "amenity" and @v = "school"]] This line says: Give me all node elements that have a tag element inside which has the k attribute value "amenity" and the v attribute value "school". Explanation: This is the condition we have to filter out those nodes that are tagged with amenity school. Further on xpath is used again - a second time: now relative to those school nodes to see if there is a name and if so to fetch it: Therefore we use the foreach-syntax:
foreach ($schools as $index => $school)
tag[@k = "name"]/@v'
tag[@k = "name"]/@v' Because not all school nodes have a name, a default string is provided for display purposes by adding it to the (then empty) result array:
list($name) = $school->xpath('tag[@k = "name"]/@v') + ['(unnamed)'];
Query returned 907 node(s) and took 1.10735 seconds. goal: to get out even more important data - see here Key:contact - OpenStreetMap Wiki
Well - we are already extracting the name: If we want to have more data then we just have to run a few more xpath queries inside our loop for all the address keys and the website. So - additionally: we do not have to forget to look for the website key additional to contact:website. cf: https://wiki.openstreetmap.org/wiki/Key:website conclusio: well - i think that i need to extend the xpath requests within the loop where xpath is used again, now relative to those school nodes to see if there is a name and if so to fetch it:
tag[@k = "name"]/@v' i did some further tess and found out very interesting things
$query = 'node
$context = stream_context_create(['http' => [
# please do not stress this service, this example is for demonstration purposes only.
$result = simplexml_load_file($endpoint);
//
# get all school nodes with xpath
$query = 'node
$context = stream_context_create(['http' => [
$endpoint = 'http://overpass-api.de/api/interpreter';
$result = simplexml_load_file($endpoint);
me/martin/dev/php/o1.php on line 68
33 School(s) found:
so far so good : if i add some lines in the part 2 i run into errors... -see below: i want to get more data out of it: - and coded like so;
{ note - within the part 2 that works with the XML-Result.
//
# get all school nodes with xpath
contact:phone I will dig into all documents and come back later the weekend... and report all the findings well - i think that i need to extend the xpath requests within the loop where xpath is used again, now relative to those school nodes to see if there is a name and if so to fetch it:
tag[@k = "name"]/@v'
more infos I am having trouble parsing data that is separated by comas in an XLS file. The upload and parsing scripts work beautifully, but the problem I am having is the data is read in from one cell (all 5 fields for the row are in column A) I am exploding it by , but some of the cells contain comas. For example a cell might contain "jim,jones,12345678,jim@jones.com,More, Data,192.168.1.1" but the next one might be "Dave,Thomas,98765432,dave@wendys.com,something else, 255.255.255.0" The problem I am having is More, Data should be one cell. Not all position 4 will have a , so I can't just add it back because the IP address would be appended to more... Any ideas? I hope thats clear enough... Hello,
I've tried to get a dynamic table from an external page, and searching for entries in it, so i used a dynamic XLS file using php excel reader. I only exported the file, but i couldn't search for data.
Can i get some help please ?
Hi guys. I have been using the wikipedia API to retrieve information about a topic. Ive managed to get a response and retrieve the first section of the topic (in this case football) Using this method - http://en.wikipedia.org/w/api.php?action=parse&page='.$search.'&redirects=1&format=json&prop=text§ion=0'); However the first section that is retrieved includes the pictures and i just want to main text from the introduction. The code that is sent back from wiki is this - Code: [Select] Array ( [parse] => Array ( [text] => Array ( [*] => <div class="dablink">This article is about sports known as football. For the ball used in these sports, see <a href="/wiki/Football_(ball)">Football (ball)</a>.</div> <div class="thumb tright"> <div class="thumbinner" style="width:227px;"><a href="/wiki/File:Football4.png" class="image"><img alt="" src="http://upload.wikimedia.org/wikipedia/commons/thumb/d/d2/Football4.png/225px-Football4.png" width="225" height="274" class="thumbimage" /></a> <div class="thumbcaption"> <div class="magnify"><a href="/wiki/File:Football4.png" class="internal" title="Enlarge"><img src="http://bits.wikimedia.org/skins-1.17/common/images/magnify-clip.png" width="15" height="11" alt="" /></a></div> Some of the many different games known as football. From top left to bottom right: <a href="/wiki/Association_football">Association football</a> or soccer, <a href="/wiki/Australian_rules_football">Australian rules football</a>, <a href="/wiki/International_rules_football">International rules football</a>, <a href="/wiki/Rugby_Union" class="mw-redirect" title="Rugby Union">Rugby Union</a>, <a href="/wiki/Rugby_League" class="mw-redirect" title="Rugby League">Rugby League</a>, and <a href="/wiki/American_Football" class="mw-redirect" title="American Football">American Football</a>.</div> </div> </div> <p>The game of <b>football</b> is any of several similar <a href="/wiki/Team_sport" title="Team sport">team sports</a>, of similar origins which involve advancing a ball into a goal area in an attempt to score. Many of these involve <a href="/wiki/Kick_(football)" title="Kick (football)">kicking</a> a ball with the foot to score a <a href="/wiki/Goal_(sport)" title="Goal (sport)">goal</a>, though not all codes of football using kicking as a primary means of advancing the ball or scoring. The most popular of these sports worldwide is <a href="/wiki/Association_football">association football</a>, more commonly known as just "football" or "soccer". Unqualified, the word <i><a href="/wiki/Football_(word)" title="Football (word)">football</a></i> applies to whichever form of football is the most popular in the regional context in which the word appears, including <a href="/wiki/American_football">American football</a>, <a href="/wiki/Australian_rules_football">Australian rules football</a>, <a href="/wiki/Canadian_football">Canadian football</a>, <a href="/wiki/Gaelic_football">Gaelic football</a>, <a href="/wiki/Rugby_league">rugby league</a>, <a href="/wiki/Rugby_union">rugby union</a> and other related games. These variations are known as "codes".</p> I want the code that resides in the <p> tags. How would i go about parsing this and removing the rest. ive tried to get to work simple html dom parser but with no luck. Any help would be greatly appreciated Thanks, DIM3NSION Hi, I am trying to make a web interface for a robot, I have written php to send/recieve values via a serial port to my robot. They work. I am now tring to develop my web interface. I'm using java to generate http requests client side in the form of; Code: [Select] /request?command=Forward¶m1=254 I was wondering how I can parse the command and param1 in php sereverside? Or is there a better alternative? Hi guys, im trying to parse a html table from an existing website to my own. However ive run into a few problems. Does anyone know how to parse html tables?? im using the PHP DOM Parser but at the moment i am only able to return all the data on the website rather then the specific table. Thanks for any help! This topic has been moved to PHP Regex. http://www.phpfreaks.com/forums/index.php?topic=308636.0 Hello again, I'm trying to scrape a table from another website using preg_match, especifically, using this code: Code: [Select] <?php $data = file_get_contents('http://tvcountdown.com/index.php'); $regex = '/[color=red]<table class="episode_list_table">[/color] (.+?) [color=red]</table>[/color]/'; preg_match($regex,$data,$match); var_dump($match); echo $match[0]; ?> Heres the thing. It doesnt work I think it's because the first and second anchors are html tags, 'cause if I parse some other stuff without any tag, there's no problemo. Any hints, mates? Thanks Hello dear Community, i have a document i need to parse it and spit out only this part of the table: see http://schulnetz.nibis.de/db/schulen/schule.php?schulnr=67003&lschb= how to i parse the stuff!? With perl or php? Note i have the xpaths (see below) Sad that i cannot apply them on Simple DOM Parser since this Dom Parser does not work with Xpaths but with CSS-Selectors: Well i want to get all the data with that are within the table that name is called class="fliess" How to dump all the results? BTW - thinking about the most elegant way, i think it is the most pretty way would be to do it with perl - So i can try it with HTML::TableExtract or.... Well what do you suggest - Which way to choose to do this [very] simple thing? Look forward to hear from you! see the xpaths: Schule: /html/body/center/table/tbody/tr[2]/td[1] Stasse: /html/body/center/table/tbody/tr[3]/td[1] Ort: /html/body/center/table/tbody/tr[4]/td[1] Tel: /html/body/center/table/tbody/tr[5]/td[1] Schulgliederungen: /html/body/center/table/tbody/tr[6]/td[1] Besonderheite: /html/body/center/table/tbody/tr[7]/td[1] E-Mail: /html/body/center/table/tbody/tr[8]/td[1] Schulnummer: /html/body/center/table/tbody/tr[9]/td[1] Hey guys, So when I put the following line of php on an html page: Code: [Select] echo '®'; I get the 'Registered' symbol. How do I turn this off? What is happening is that it is part of a longer string that represents an url and the URL is not rendering correctly due to the special character. Thanks Hello to all, I have problem figuring out how to properly display data fetched from MySQL database in a HTML table. In the below example I am using two while loops, where the second one is nested inside first one, that check two different expressions fetching data from tables found in a MySQL database. The second expression compares the two tables IDs and after their match it displays the email of the account holder in each column in the HTML table. The main problem is that the 'email' row is displayed properly while its while expression is not nested and alone(meaning the other data is omitted or commented out), but either nested or neighbored to the first while loop, it is displayed horizontally and the other data ('validity', 'valid_from', 'valid_to') is not displayed.'
Can someone help me on this, I guess the problem lies in the while loop? <thead> <tr> <th data-column-id="id" data-type="numeric">ID</th> <th data-column-id="email">Subscriber's Email</th> <th data-column-id="validity">Validity</th> <th data-column-id="valid_from">Valid From</th> <th data-column-id="valid_to">Valid To</th> </tr> </thead> Here is part of the PHP code:
<?php while($row = $stmt->fetch(PDO::FETCH_ASSOC)) { echo ' <tr> <td>'.$row["id"].'</td> '; while ($row1 = $stmt1->fetch(PDO::FETCH_ASSOC)) { echo ' <td>'.$row1["email"].'</td> '; } if($row["validity"] == 1) { echo '<td>'.$row["validity"].' month</td>'; }else{ echo '<td>'.$row["validity"].' months</td>'; } echo ' <td>'.$row["valid_from"].'</td> <td>'.$row["valid_to"].'</td> </tr>'; } ?>
Thank you. I have a paragrpah which has tags that needs to be stripped off. so the paragraph looks like Quote <div id="ctl00_placeholderMain_pnlInTheBox" class="tabitem"> <p> HP LaserJet 9050 printer<br/> Power cord<br/> Parallel cable<br/> HP LaserJet Q8543X Smart print cartridge<br/> Printer documentation<br/> Printer software CD<br/> Control panel overlay<br/> Face-up output bin<br/> Two 500-sheet input tray<br/> 100 Sheet Multipurpose Tray<br/> HP JetDirect Fast</p> </div> I want it to look like Quote HP LaserJet 9050 printer Power cord Parallel cable HP LaserJet Q8543X Smart print cartridge Printer documentation Printer software CD Control panel overlay Face-up output bin Two 500-sheet input tray 100 Sheet Multipurpose Tray HP JetDirect Fast How would I go on about doing this.. currently i use Code: [Select] $inbox = $html->find( "#ctl00_placeholderMain_pnlInTheBox" ); if ( isset( $inbox[ 0 ] ) ) { $box =( $inbox[0] ); $box = strpos($box, ';') !== FALSE ? substr( $box, strpos( $box, ";" ) + 1 ) : $box; } else { $box = "0"; } $name = "D'Angelo" ok I'm running a mysql query as $query = " INSERT INTO TEST (ID, NAME) VALUES ('NULL','$NAME')"; If the name = "D'Angelo" the apostrophe would cause it to fail. Is there a way to do this without striping the characters? Hi All I Am confused I would like to put info into a database but need it to be secure. I have some code shown below. The problem is I would like to put in ' but keep the data secure. When it comes back I do not want to show \' I think you might know what I am trying to do. Here is the code but would like to know how to stop the \' showing. Code: [Select] $password = mysql_real_escape_string(stripslashes(trim($_POST['password']))); Any help would be great thank you. |