/** * Adds a rule to the list of rules that decides what types of content should be streamed diretly to a temporary file. * * If a content-type of a page or file matches with one of these rules, the content will be streamed directly into a * temporary file without claiming local RAM. * * It's recommendend to add all content-types of files that may be of bigger size to prevent memory-overflows. * By default the crawler will receive every content to memory! * * The content/source of pages and files that were streamed to file are not accessible directly within the overidden method * {@link handleDocumentInfo()}, instead you get information about the file the content was stored in. * (see properties {@link PHPCrawlerDocumentInfo::received_to_file} and {@link PHPCrawlerDocumentInfo::content_tmp_file}). * * Please note that this setting doesn't effect the link-finding results, also file-streams will be checked for links. * * A common setup may look like this example: * <code> * // Basically let the crawler receive every content (default-setting) * $crawler->addReceiveContentType("##"); * * // Tell the crawler to stream everything but "text/html"-documents to a tmp-file * $crawler->addStreamToFileContentType("#^((?!text/html).)*$#"); * </code> * * @param string $regex The rule as a regular-expression * @return bool TRUE if the rule was added to the list and the regex is valid. * @section 10 Other settings */ public function addStreamToFileContentType($regex) { return $this->PageRequest->addStreamToFileContentType($regex); }