Learn VBA & Macros in 1 Week!

PHP - Portiing Over A Parser From Bs4 To Simplehtmldom-parser

Full Excel VBA Course - Beginner to Expert

Portiing Over A Parser From Bs4 To Simplehtmldom-parser	View Content

hello dear Freaks

i am currently musing bout the portover of a python bs4 parser to php - working with the simplehtmldom-parser / pr the DOM-selectors... (see below).

The project: for a list of meta-data of wordpress-plugins: - approx 50 plugins are of interest! but the challenge is: i want to fetch meta-data of all the existing plugins. What i subsequently want to filter out after the fetch is - those plugins that have the newest timestamp - that are updated (most) recently. It is all aobut acutality...

https://wordpress.org/plugins/participants-database ....and so on and so forth.

https://wordpress.org/plugins/wp-job-manager
https://wordpress.org/plugins/ninja-forms
https://wordpress.org/plugins/participants-database ....and so on and so forth.

we have the following set of meta-data for each wordpress-plugin:

Version: 1.9.5.12 
installations: 10,000+    
WordPress Version: 5.0 or higher 
Tested up to: 5.4 PHP  
Version: 5.6 or higher    
Tags 3 Tags:databasemembersign-up formvolunteer
Last updated: 19 hours ago

the project consits of two parts: the looping-part: (which seems to be pretty straightforward). the parser-part: where i have some issues - see below. I'm trying to loop through an array of URLs and scrape the data below from a list of wordpress-plugins. See my loop below-

as a base i think it is good starting point to work from the following target-url:

plugins wordpress.org/plugins/browse/popular with 99 pages of content: cf ...
wordpress.org/plugins/browse/popular/page/1
wordpress.org/plugins/browse/popular/page/2
wordpress.org/plugins/browse/popular/page/99

the Output of text_nodes:

['Version: 1.9.5.12', 'Active installations: 10,000+', 'Tested up to: 5.6 ']

but if we want to fetch the data of all the wordpress-plugins and subesquently sort them to show the -let us say - latest 50 updated plugins. This would be a interesting task:

first of all we need to fetch the urls

then we fetch the information and have to sort out the newest- the newest timestamp. Ie the plugin that updated most recently

List the 50 newest items - that are the 50 plugins that are updated recently ..

we have the following set

see here the Soup_

 soup = BeautifulSoup(r.content, 'html.parser')
        target = [item.get_text(strip=True, separator=" ") for item in soup.find(
            "h3", class_="screen-reader-text").find_next("ul").findAll("li")[:8]]
        head = [soup.find("h1", class_="plugin-title").text]
        new = [x for x in target if x.startswith(
            ("V", "Las", "Ac", "W", "T", "P"))]
        return head + new


with ThreadPoolExecutor(max_workers=50) as executor1:
    futures1 = [executor1.submit(parser, url) for url in allin]

for future in futures1:
    print(future.result())

see the formal output

Quote

[lorem ipsum dolor sit amet', 'Version: 2.34.1', 'Last updated: 5 months ago', 'Tags: magna aliquyam erat, sed diam voluptua. At vero eos et accusam']
[consetetur sadipscing elitr', 'Version: 6.54.1', 'Last updated: 5 months ago', 'Tags: lorem ipsum dolor sit amet']
[sed diam nonumy eirmod tempor invidunt ut labore', 'Version: 7.16.1', 'Last updated: 5 months ago', 'Tags: tarifa, sevilla lisabin invidunt ut labore et dolore magna aliquyam erat']
[tempor invidunt ut taria malaga jerusalem labore', 'Version: 9.58.1', 'Last updated: 5 months ago', 'Tags: ilabore et lissabon dolore magna aliquyam erat']

background: https://stackoverflow.com/questions/61106309/fetching-multiple-urls-with-beautifulsoup-gathering-meta-data-in-wp-plugins

Well - i guess that we c an do this with the simple DOM Parser - here the seclector reference.

https://stackoverflow.com/questions/1390568/how-can-i-match-on-an-attribute-that-contains-a-certain-string

look forward to any hint and help.

have a great day

Edited May 3, 2020 by dil_bert

Full Excel VBA Course - Beginner to Expert

Csv Parser

Similar Tutorials

PHP - Portiing Over A Parser From Bs4 To Simplehtmldom-parser

Portiing Over A Parser From Bs4 To Simplehtmldom-parser

Similar Tutorials

Csv Parser

Help With Xml Parser

Xml Parser

Bbcode Parser

Html Dom Parser

My Xml Parser Class...

Php Html Dom Parser

Php Xml Parser Question

Bbcode Parser Problem

File_get_contents Or Curl - Which One To Take For A Little Parser

Bbcode Parser Mess-up

Xml Parser Script Error

Simple Html Parser Help

Math Expression Parser

Getting Parser Error Regarding The Syntax.

Recursive Descent Parser [need Help]

Xml Parser Loop Breaks

Php Parser For Chemical And Math Formulas

Domdocument - Parser: I Need A Starting Point

Unexpected T_object_operator Error - Dom Parser