process() public method

Tidy helps us deal with PHP's patchy HTML parsing most of the time but it has problems of its own which we try to avoid with this option.
public process ( string $html, string $url, SiteConfig $siteConfig = null, boolean $smartTidy = true ) : boolean
$html string
$url string
$siteConfig Graby\SiteConfig\SiteConfig Will avoid to recalculate the site config
$smartTidy boolean Do we need to tidy the html ?
return boolean true on success, false on failure
Esempio n. 1
0
 public function testIframeEmbeddedContent()
 {
     $contentExtractor = new ContentExtractor(self::$contentExtractorConfig);
     $config = new SiteConfig();
     // '//header' is a bad pattern, and it will jump to the next one
     $config->body = array('//header', '//div');
     // obviously a bad parser which will be converted to use the default one
     $config->parser = 'toto';
     $res = $contentExtractor->process('<div>' . str_repeat('this is the best part of the show', 10) . '</div><div class="video_player"><iframe src="http://www.dailymotion.com/embed/video/x2kjh59" frameborder="0" width="534" height="320"></iframe></div>', 'https://lemonde.io/35941909', $config);
     $this->assertTrue($res, 'Extraction went well');
     $domElement = $contentExtractor->getContent();
     $content = $domElement->ownerDocument->saveXML($domElement);
     $this->assertContains('<iframe src="http://www.dailymotion.com/embed/video/x2kjh59" frameborder="0" width="534" height="320">[embedded content]</iframe>', $content);
 }