Skip to content

highestgoodlikewater/spider-5

 
 

Repository files navigation

Scalable Web Spider (Crawler) in PHP Build Status

This application lets you scan the Web using many workers connected by ZeroMQ.

You can specify for example 10 000 URIs to be fetched and stored in a folder using 20 workers (concurrent downloads).

./run_test.sh 
Giving workers some time to connect
Total count of Tasks put in the Queue: 173
Waiting for acknowledgement from Task Result Collector
Info from Task Result Collector: 173
Informing all workers to stop

Setup for Ubuntu 14.04

  • If you don't have ZeroMQ with PHP ZMQ module you can install it by running setup_zeromq.sh
  • Run setup.sh

Usage

To see how it works you can look at and try ./sandbox/run_test.sh

Architecture

It's using ZeroMQ's Parallel Pipeline

Plans

See TODO

Authors

Damian Sromek

License

Spider is licensed under the MIT License - see the LICENSE file for details

About

Scalable Web Spider (Crawler) in PHP

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • PHP 92.4%
  • Shell 7.6%