Crawler

A standalone php script for crawling a site.

This script makes it relatively easy to

find broken links
find slow pages
simulate semi-real usage (well, that's a stretch - but it's better than pinging the same page 20 times)
Check any browser sniffing you might be doing

Usage

It's a pretty simple script, to see the full help call with no parameters

$ . crawl

To crawl a site - specify where to start, and the script will crawl

$ . crawl http://ad7six.com
/ (0) 1.2671s
		writing cache
/ » 1 » /contact (1) 1.2357s
		writing cache
/ » 1 » /entries/index/2006 (2) 1.2564s
		writing cache
/ » 1 » /entries/index/2007 (3) 1.2598s
		writing cache
/ » 1 » /entries/index/2008 (4) 1.0801s
		writing cache
/ » 1 » /entries/index/2009/11 (5) 1.0758s
		writing cache
...

The script will continue until one of the following conditions is met:

There are no more links to crawl
The maximum number of pages have been requeted (by default there is no limit)
There are no more links within the depth specified

If the script halts - or you halt it, calling it again with the same parameters will take the page contents from the cache.

Name		Name	Last commit message	Last commit date
Latest commit History 30 Commits
README.md		README.md
crawl		crawl
crawl.php		crawl.php
generic_processor.php		generic_processor.php
mi_crawler.php		mi_crawler.php
mi_crawler.test.php		mi_crawler.test.php

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

crawl

crawl

crawl.php

crawl.php

generic_processor.php

generic_processor.php

mi_crawler.php

mi_crawler.php

mi_crawler.test.php

mi_crawler.test.php

Repository files navigation

Crawler

Usage

About

Releases

Packages

r0mk1n/crawler

Folders and files

Latest commit

History

Repository files navigation

Crawler

Usage

About

Resources

Stars

Watchers

Forks