PHP PhraseParser::extractWordStringPageSummary示例

编程语言: PHP

类/类型: PhraseParser

方法/功能: extractWordStringPageSummary

hotexamples.com的示例: 1

PHP PhraseParser::extractWordStringPageSummary - 已找到1个示例。这些是从开源项目中提取的最受好评的PhraseParser::extractWordStringPageSummary现实PHP示例。您可以评价示例，以帮助我们提高示例质量。

常用方法

显示隐藏

extractPhrasesInLists(6)

getTokenizer(5)

computeSafeSearchScore(4)

calculateMetas(3)

calculateLinkMetas(2)

canonicalizePunctuatedTerms(1)

extractPhrases(1)

extractPhrasesAndCount(1)

extractWordStringPageSummary(1)

getCharGramsTerm(1)

getCosineRank(1)

getIntersection(1)

reverseMaximalMatch(1)

segmentSegment(1)

stemCharGramSegment(1)

stemTerms(1)

示例#1

显示文件

文件： search_controller.php 项目： yakar/yioop

 /**
  * Given a page summary extract the words from it and try to find documents
  * which match the most relevant words. The algorithm for "relevant" is
  * pretty weak. For now we pick the $num many words whose ratio
  * of number of occurences in crawl item/ number of occurences in all
  * documents is the largest
  *
  * @param string $crawl_item a page summary
  * @param int $num number of key phrase to return
  * @param int $crawl_time the timestamp of an index to use, if 0 then
  *     default used
  * @return array  an array of most selective key phrases
  */
 function getTopPhrases($crawl_item, $num, $crawl_time = 0)
 {
     $crawl_model = $this->model("crawl");
     $queue_servers = $this->model("machine")->getQueueServerUrls();
     if ($crawl_time == 0) {
         $crawl_time = $crawl_model->getCurrentIndexDatabaseName();
     }
     $this->model("phrase")->index_name = $crawl_time;
     $crawl_model->index_name = $crawl_time;
     $phrase_string = PhraseParser::extractWordStringPageSummary($crawl_item);
     $crawl_item[self::LANG] = isset($crawl_item[self::LANG]) ? $crawl_item[self::LANG] : DEFAULT_LOCALE;
     $page_word_counts = PhraseParser::extractPhrasesAndCount($phrase_string, $crawl_item[self::LANG]);
     $words = array_keys($page_word_counts);
     $word_counts = $crawl_model->countWords($words, $queue_servers);
     $word_ratios = array();
     foreach ($page_word_counts as $word => $count) {
         $word_ratios[$word] = isset($word_counts[$word]) && $word_counts[$word] > 0 ? $count / $word_counts[$word] : 0;
         /*discard cases where word only occurs in one doc as want
           to find related relevant documents */
         if ($word_ratios[$word] == 1) {
             $word_ratios[$word] = 0;
         }
     }
     uasort($word_ratios, "greaterThan");
     $top_phrases = array_keys($word_ratios);
     $top_phrases = array_slice($top_phrases, 0, $num);
     return $top_phrases;
 }