/** * Adds a rule to the list of rules that decides which pages or files - regarding their content-type - should be received * * After receiving the HTTP-header of a followed URL, the crawler check's - based on the given rules - whether the content of that URL * should be received. * If no rule matches with the content-type of the document, the content won't be received. * * Example: * <code> * $crawler->addContentTypeReceiveRule("#text/html#"); * $crawler->addContentTypeReceiveRule("#text/css#"); * </code> * This rules lets the crawler receive the content/source of pages with the Content-Type "text/html" AND "text/css". * Other pages or files with different content-types (e.g. "image/gif") won't be received (if this is the only rule added to the list). * * <b>IMPORTANT:</b> By default, if no rule was added to the list, the crawler receives every content. * * Note: To reduce the traffic the crawler will cause, you only should add content-types of pages/files you really want to receive. * But at least you should add the content-type "text/html" to this list, otherwise the crawler can't find any links. * * @param string $regex The rule as a regular-expression * @return bool TRUE if the rule was added to the list. * FALSE if the given regex is not valid. * @section 2 Filter-settings */ public function addContentTypeReceiveRule($regex) { return $this->PageRequest->addReceiveContentType($regex); }