Skip to content

wufe/Scraper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

21 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Generic Scraper

This is a scraper powered by PHP & composer

The purpose

For now, the main purpose is to locate the main content of a web page

To start:

composer install
php src/index.php

How it works

After the download of the page, there are 4 steps:

Parsing

In this phase there's the generation of the tree structure.

Advanced parsing

There's an augmentation of info carried by the tree.

Scan

The scan acts looking for patterns.

Tweak

The tweak can be ad-hoc for each site, but in this case is only one and it's generic. Its main purpose is to choose a threshold for the identification of the results.

The ( raw & messy ) results are stored ( for now ) in a file called "priority", after the execution of the code

Screenshot

Screenshot

About

Generic Scraper

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages