Esempi in PHP per MyCrawler::enableResumption

Linguaggio di programmazione: PHP

Classe/tipologia: MyCrawler

Metodo/funzione: enableResumption

Esempi su hotexamples.com: 1

MyCrawler::enableResumption in PHP: 1 esempio trovato. Questo è il miglior esempio reale in PHP per MyCrawler::enableResumption, estratto da progetti open source. Lo puoi valutare, per aiutarci a migliorare la qualità dei nostri esempi.

Metodi utilizzati di frequente

Mostra Nascondi

setURL(23)

addURLFilterRule(5)

setTrafficLimit(3)

obeyRobotsTxt(3)

setPageLimit(3)

addContentTypeReceiveRule(2)

goMultiProcessed(2)

go(2)

obeyNoFollowTags(2)

enableAggressiveLinkSearch(2)

addURLFollowRule(2)

setFollowMode(2)

setCrawlingDepthLimit(1)

setUrlCacheType(1)

setLinkExtractionTags(1)

setUserAgentString(1)

setWorkingDirectory(1)

addBasicAuthentication(1)

resume(1)

processLinks(1)

getProcessReport(1)

getCrawlerId(1)

excludeLinkSearchDocumentSections(1)

enableResumption(1)

enableCookieHandling(1)

addReceiveContentType(1)

addLinkSearchContentType(1)

set_url_test_auth(1)

Esempio n. 1

Mostra file

File: resumable_example.php Progetto: hasandz/phpcrawl

        } else {
            $lb = "<br />";
        }
        // Print the URL and the HTTP-status-Code
        echo "Page requested: " . $DocInfo->url . " (" . $DocInfo->http_status_code . ")" . $lb;
        flush();
    }
}
$crawler = new MyCrawler();
$crawler->setURL("www.php.net");
$crawler->addContentTypeReceiveRule("#text/html#");
$crawler->addURLFilterRule("#\\.(jpg|jpeg|gif|png)\$# i");
$crawler->setPageLimit(50);
// Set the page-limit to 50 for testing
// Important for resumable scripts/processes!
$crawler->enableResumption();
// At the firts start of the script retreive the crawler-ID and store it
// (in a temporary file in this example)
if (!file_exists("/tmp/mycrawlerid_for_php.net.tmp")) {
    $crawler_ID = $crawler->getCrawlerId();
    file_put_contents("/tmp/mycrawlerid_for_php.net.tmp", $crawler_ID);
} else {
    $crawler_ID = file_get_contents("/tmp/mycrawlerid_for_php.net.tmp");
    $crawler->resume($crawler_ID);
}
// Start crawling
$crawler->goMultiProcessed(5);
// Delete the stored crawler-ID after the process is finished completely and successfully.
unlink("/tmp/mycrawlerid_for_php.net.tmp");
$report = $crawler->getProcessReport();
if (PHP_SAPI == "cli") {