PHP MyCrawler::go Exemples

Langage de programmation: PHP

Class/Type: MyCrawler

Méthode/Fonction: go

Exemples au hotexamples.com: 2

PHP MyCrawler::go - 2 exemples trouvés. Ce sont les exemples réels les mieux notés de MyCrawler::go extraits de projets open source. Vous pouvez noter les exemples pour nous aider à en améliorer la qualité.

Méthodes fréquemment utilisées

Afficher Cacher

setURL(23)

addURLFilterRule(5)

setTrafficLimit(3)

obeyRobotsTxt(3)

setPageLimit(3)

addContentTypeReceiveRule(2)

goMultiProcessed(2)

go(2)

obeyNoFollowTags(2)

enableAggressiveLinkSearch(2)

addURLFollowRule(2)

setFollowMode(2)

setCrawlingDepthLimit(1)

setUrlCacheType(1)

setLinkExtractionTags(1)

setUserAgentString(1)

setWorkingDirectory(1)

addBasicAuthentication(1)

resume(1)

processLinks(1)

getProcessReport(1)

getCrawlerId(1)

excludeLinkSearchDocumentSections(1)

enableResumption(1)

enableCookieHandling(1)

addReceiveContentType(1)

addLinkSearchContentType(1)

set_url_test_auth(1)

Méthodes fréquemment utilisées

setURL (23)

addURLFilterRule (5)

setTrafficLimit (3)

obeyRobotsTxt (3)

setPageLimit (3)

addContentTypeReceiveRule (2)

goMultiProcessed (2)

go (2)

obeyNoFollowTags (2)

enableAggressiveLinkSearch (2)

Méthodes fréquemment utilisées

addURLFollowRule (2)

setFollowMode (2)

setCrawlingDepthLimit (1)

setUrlCacheType (1)

setLinkExtractionTags (1)

setUserAgentString (1)

setWorkingDirectory (1)

addBasicAuthentication (1)

resume (1)

processLinks (1)

getProcessReport (1)

getCrawlerId (1)

excludeLinkSearchDocumentSections (1)

enableResumption (1)

enableCookieHandling (1)

addReceiveContentType (1)

addLinkSearchContentType (1)

set_url_test_auth (1)

Méthodes fréquemment utilisées

getProcessReport (1)

getCrawlerId (1)

excludeLinkSearchDocumentSections (1)

enableResumption (1)

enableCookieHandling (1)

addReceiveContentType (1)

addLinkSearchContentType (1)

set_url_test_auth (1)

Associées

VariedadDAO

game_millionaire_select_serial_question

get_menu_by_location

EmailAddress

Wrapper

return_failed

CompanyProfile

Espresso_GoogleRequest

BodyResult

dropdown_generator

Related in langs

DbConnectionBase (C#)

HelpPageApiModel (C#)

isSorted (C++)

gtk_tree_model_get_iter_first (C++)

Debugf (Go)

NoFile (Go)

javax.jcr.Node (Java)

Permissions (Java)

Scanner (Python)

get_cart_items (Python)

Exemple #1

0

Afficher le fichier

Fichier : crawler.php Projet : suvash23/app-demo

/** * crawl method * Create the crawler class object and set the options for crawling * @param type $u URL */ function crawl($u) { $C = new MyCrawler(); $C->setURL($u); $C->addContentTypeReceiveRule("#text/html#"); /* Only receive HTML pages */ $C->addURLFilterRule("#(jpg|gif|png|pdf|jpeg|svg|css|js)\$# i"); /* We don't want to crawl non HTML pages */ $C->setTrafficLimit(2000 * 1024); $C->obeyRobotsTxt(true); /* Should We follow robots.txt */ $C->go(); }

Exemple #2

0

Afficher le fichier

Fichier : example.php Projet : hasandz/phpcrawl

} // Now, create a instance of your class, define the behaviour // of the crawler (see class-reference for more options and details) // and start the crawling-process. $crawler = new MyCrawler(); // URL to crawl $crawler->setURL("www.php.net"); // Only receive content of files with content-type "text/html" $crawler->addContentTypeReceiveRule("#text/html#"); // Ignore links to pictures, dont even request pictures $crawler->addURLFilterRule("#\\.(jpg|jpeg|gif|png)\$# i"); // Store and send cookie-data like a browser does $crawler->enableCookieHandling(true); // Set the traffic-limit to 1 MB (in bytes, // for testing we dont want to "suck" the whole site) $crawler->setTrafficLimit(1000 * 1024); // Thats enough, now here we go $crawler->go(); // At the end, after the process is finished, we print a short // report (see method getProcessReport() for more information) $report = $crawler->getProcessReport(); if (PHP_SAPI == "cli") { $lb = "\n"; } else { $lb = "<br />"; } echo "Summary:" . $lb; echo "Links followed: " . $report->links_followed . $lb; echo "Documents received: " . $report->files_received . $lb; echo "Bytes received: " . $report->bytes_received . " bytes" . $lb; echo "Process runtime: " . $report->process_runtime . " sec" . $lb;