Learn VBA & Macros in 1 Week!

PHP - Need Help With Simple_html_dom

Full Excel VBA Course - Beginner to Expert

Need Help With Simple_html_dom	View Content

I am using simple_html_dom.php

I am stuck with the Code of How to parse below Content :
Quote

<div id="entry_4" class="entry clearfix "><div class="entry_title clearfix"><h1 class=" ">Smith J</h1></div><div class="full_listing"><div class="blocks"><div id="entry_4_block_0" class="block indent-level-0"><div class="share_link" wpol:entryId="719183066N00W" wpol:contactPointId="719183066N00W"><div class="save_menu"><div class="icon"></div></div><div class="share_menu"><div class="icon"></div></div><a class="screen_reader_only" rel="nofollow"
href="/mobile/send-to-mobile-accessible?entryId=719183066N00W&listingId=719183066N00W&searchType=R&channel=WP"
name="Smith">Send this listing to your mobile</a></div><span class="phone_number ">0457 599 539</span>
<div class="address"><span class="street_line">1 Martin Pl</span><span class="locality">Sydney</span><span class="state">NSW</span><span class="postcode">2000</span></div><a rel="nofollow"
class="show_map"
name="Smith"
href="/search/where-is?locality=Sydney&streetNumber=1&streetName=Martin&streetType=Pl&state=NSW&product=N00W%23719183066N00W%23Smith+J&channel=WP"
onclick="return false;">Show map...</a></div></div></div></div>

I am trying
if(!$html->find('div[id=entry_' .$i.']',0)==""){
echo "inside0000";
foreach($html->find('div[id=entry_' .$i.']') as $result){
$resultdata[]=array(
'name' => $result->find('h[class=" "]',0)->innertext,
'streetLine' => $result->find('span[class=street_line]',0)->innertext,
'locality' => $result->find('span[class=locality]',0)->innertext,
'state' => $result->find('span[class=state]',0)->innertext,
'postcode' => $result->find('span[class=postcode]',0)->innertext,
'phone' => $result->find('span[phone_number ]',0)->innertext
);

It gets Into

inside0000

But doesn't Parse the Data.

Can anyone help me please ?

Full Excel VBA Course - Beginner to Expert

High Cpu Load On Simple_html_dom

Similar Tutorials

View Content

I successfully load a page by simple_html_dom.php (developed in simplehtmldom.sourceforge.net) as
$html = file_get_html('externalpage');

But sometimes this make a high load on CPU and the page does not load for a long time (probably due to the external site server). How can I skip the process when it is not normal to avoid high CPU usage?

Help: Simple_html_dom.php Select First Table Row Only

Similar Tutorials

View Content

Gidday all,

My Utimate goal is to parse the data on the first row in first table and first row in second table.
from he http://www.bom.gov.au/products/IDQ60901/IDQ60901.94580.shtml

Presently I can only parse data in the last row in the last table.

I got to this point about 2 days ago, I am unable to find any info as to what I need to do to achieve what I want.
some of the info I've found I don't understand.

Need newbie help.

What do I need to add/change to parse the data in at least the first table row?

Code: [Select]

<?php
error_reporting(E_ALL);
include_once('htmldom/simple_html_dom.php');
$url = 'http://www.bom.gov.au/products/IDQ60901/IDQ60901.94580.shtml';

// Create DOM from URL
$html = file_get_html($url);

foreach($html->find('table tr') as $weather) {
    if($weather->find('th')) {continue;} //apparently this needs to be added because there is a bug in simple_html_dom.php
    if(!$weather->find('td ', 0)) {continue;}

    $datetime = $weather->find('td', 0)->plaintext;
    $currentTemp = $weather->find('td', 1)->plaintext;

}

print_r('updated:' . '&nbsp' .$datetime);
print_r ('<br>');
print_r('CurrentTmp:' . '&nbsp' .$currentTemp);
print_r ('<br>');
?>

Simple Html Dom Parser 'simple_html_dom.php' Problem

Similar Tutorials

View Content

Remove Empty Paragraphs From Html File Using Simple_html_dom

Similar Tutorials

View Content

I want to remove empty paragraphs from an HTML document using simple_html_dom.php. I know how to do it using the DOMDocument class, but, because the HTML files I work with are prepared in MS Word, the DOMDocument's loadHTMLFile() function gives this exception "Namespaces are not defined".

This is the code I use with the DOMDocument object for HTML files not prepared in MS Word:
<?php
/* Using the DOMDocument class */

/* Create a new DOMDocument object. */
$html = new DOMDocument("1.0", "UTF-8");

/* Load HTML code from an HTML file into the DOMDocument. */
$html->loadHTMLFile("HTML File With Empty Paragraphs.html");

/* Assign all the <p> elements into the $pars DOMNodeList object. */
$pars = $html->getElementsByTagName("p");

echo "The initial number of paragraphs is " . $pars->length . ".<br />";

/* The trim() function is used to remove leading and trailing spaces as well as
* newline characters. */
for ($i = 0; $i < $pars->length; $i++){
    if (trim($pars->item($i)->textContent) == ""){
        $pars->item($i)->parentNode->removeChild($pars->item($i));
        $i--;
    }
}

echo "The final number of paragraphs is " . $pars->length . ".<br />";

// Write the HTML code back into an HTML file.
$html->saveHTMLFile("HTML File WithOut Empty Paragraphs.html");
?>

This is the code I use with the simple_html_dom.php module for HTML files prepared in MS Word:
<?php
/* Using simple_html_dom.php */

include("simple_html_dom.php");

$html = file_get_html("HTML File With Empty Paragraphs.html");

$pars = $html->find("p");

for ($i = 0; $i < count($pars); $i++) {
    if (trim($pars[$i]->plaintext) == "") {
        unset($pars[$i]);
        $i--;
    }
}

$html->save("HTML File without Empty Paragraphs.html");
?>

It is almost the same, except that that the $pars variable is a DOMNodeList when using DOMDocument and an array when using simple_html_dom.php. But this code does not work. First it runs for two minutes and then reports these errors: "Undefined offset: 1" and "Trying to get property of nonobject" for this line: "if (trim($pars[$i]->plaintext == "")) {".

Does anyone know how I can fix this?

Thank you.

I also asked on stackoverflow.

Simple_html_dom: Simple Use-case - To Get Back Data For Storing In Sqlite Db

Similar Tutorials

View Content

hello dear php-experts,

i fairly new to simple_html_dom usage and methods. I know a little the parser,

i want to gather some information from this site:

https://europa.eu/youth/volunteering/organisations_en#open

is this possible to get the content - of let us say 10 or 20 last records on that page - and subesquently to store it in my mysql - db!?

<?php
// Report all PHP errors (see changelog)
error_reporting(E_ALL);

include('inc/simple_html_dom.php');

    //base url
    $base = 'https://europa.eu/youth/volunteering/organisations_en#open';

    //home page HTML
    $html_base = file_get_html( $base );

    //get all category links
    foreach($html_base->find('a') as $element) {
        echo "<pre>";
        print_r( $element->href );
        echo "</pre>";
    }

    $html_base->clear(); 
    unset($html_base);

?>

I have the above code and I'm trying to get certain elements of the page but it isn't returning anything.

Is it possible that certain PHP functions might be disabled on the server to stop that?

The above code works perfectly on other sites.

Is there any workaround?

btw: i have created a small snipped as a proof of concept to run this with Python and BeautifulSoup -


import requests
from bs4 import BeautifulSoup
 
url = 'https://europa.eu/youth/volunteering/organisations_en#open'
response = requests.get(url)
soup = BeautifulSoup(response.content, 'lxml')
print(soup.find('title').text)
block = soup.find('div', class_="eyp-card block-is-flex")

and this....

European Youth Portal
>>> block.a
<a href="/youth/volunteering/organisation/48592_en" target="_blank">"Academy for Peace and Development" Union</a>
>>> block.a.text
'"Academy for Peace and Development" Union'
 
>>> block.select_one('div > div > p:nth-child(9)')
<p><strong>PIC:</strong> 948417016</p>
>>> block.select_one('div > div > p:nth-child(9)').text
'PIC: 948417016'

what is aimed in the end - i want to gather the first 20 results of the page - and put them in to a sql-db or alternatively show the information in a little widget