Generic Scraper

This is a scraper powered by PHP & composer

The purpose

For now, the main purpose is to locate the main content of a web page

To start:

composer install
php src/index.php

How it works

After the download of the page, there are 4 steps:

Parsing

In this phase there's the generation of the tree structure.

Advanced parsing

There's an augmentation of info carried by the tree.

Scan

The scan acts looking for patterns.

Tweak

The tweak can be ad-hoc for each site, but in this case is only one and it's generic. Its main purpose is to choose a threshold for the identification of the results.

The ( raw & messy ) results are stored ( for now ) in a file called "priority", after the execution of the code

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
src		src
.gitignore		.gitignore
README.md		README.md
composer.json		composer.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

src

src

.gitignore

.gitignore

README.md

README.md

composer.json

composer.json

Repository files navigation

Generic Scraper

This is a scraper powered by PHP & composer

The purpose

How it works

Parsing

Advanced parsing

Scan

Tweak

Screenshot

About

Releases

Packages

Languages

wufe/Scraper

Folders and files

Latest commit

History

Repository files navigation

Generic Scraper

This is a scraper powered by PHP & composer

The purpose

How it works

Parsing

Advanced parsing

Scan

Tweak

Screenshot

About

Resources

Stars

Watchers

Forks

Languages