PHP TextAnalysis\Documents TokensDocument::getDocumentData 예제들

프로그래밍 언어: PHP

네임스페이스/패키지 이름: TextAnalysis\Documents

클래스/타입: TokensDocument

메소드/함수: getDocumentData

hotexamples.com에서의 예제들: 4

PHP TextAnalysis\Documents TokensDocument::getDocumentData - 4개의 예제가 발견되었습니다. 이것들은 오픈소스 프로젝트에서 추출된 PHP의 TextAnalysis\Documents\TokensDocument::getDocumentData에 대한 실세계 최고 등급의 예제들입니다. 예제들을 평가하여 예제의 품질 향상에 도움을 줄 수 있습니다.

자주 사용되는 메소드들

보기 숨기기

getDocumentData(4)

applyTransformation(3)

getId(2)

toArray(1)

getDocumentData() 공개 메소드

Return an array of tokens

public getDocumentData ( ) : array
리턴	array

TokensDocument 1 문서

예제 #1

파일 보기

파일: InvertedIndex.php 프로젝트: yooper/php-text-analysis

 /**
  * Add a document
  * @param TokensDocument $document
  * @return void
  */
 public function addDocument(TokensDocument $document)
 {
     foreach ($document->getDocumentData() as $term) {
         if (isset($this->index[$term])) {
             $this->index[$term][self::FREQ]++;
             $this->index[$term][self::POSTINGS][] = $document->getId();
         } else {
             $this->index[$term] = [self::FREQ => 1, self::POSTINGS => [$document->getId()]];
         }
     }
 }

예제 #2

파일 보기

파일: StanfordPosTaggerTest.php 프로젝트: yooper/php-text-analysis

 public function testStanfordPos()
 {
     if (getenv('SKIP_TEST') || !getenv('JAVA_HOME')) {
         return;
     }
     $document = new TokensDocument((new WhitespaceTokenizer())->tokenize($this->text));
     $jarPath = get_storage_path('corpora/stanford_pos_tagger') . 'stanford-postagger-3.6.0.jar';
     $modelPath = get_storage_path('corpora/stanford_pos_tagger' . DIRECTORY_SEPARATOR . "models") . "english-left3words-distsim.tagger";
     $tagger = new StanfordPosTagger($jarPath, $modelPath);
     $output = $tagger->tag($document->getDocumentData());
     $this->assertFileExists($tagger->getTmpFilePath());
     $this->assertEquals(138, filesize($tagger->getTmpFilePath()));
     $this->assertEquals(['Michigan', 'NNP'], $output[15], "Did you set JAVA_HOME env variable?");
 }

예제 #3

파일 보기

파일: StanfordNerTaggerTest.php 프로젝트: yooper/php-text-analysis

 public function testStanfordNer()
 {
     if (getenv('SKIP_TEST') || !getenv('JAVA_HOME')) {
         return;
     }
     $document = new TokensDocument((new WhitespaceTokenizer())->tokenize($this->text));
     $jarPath = get_storage_path('ner') . 'stanford-ner.jar';
     $classiferPath = get_storage_path('ner' . DIRECTORY_SEPARATOR . "classifiers") . "english.all.3class.distsim.crf.ser.gz";
     $tagger = new StanfordNerTagger($jarPath, $classiferPath);
     $output = $tagger->tag($document->getDocumentData());
     $this->assertFileExists($tagger->getTmpFilePath());
     $this->assertEquals(138, filesize($tagger->getTmpFilePath()));
     $this->assertEquals(['Michigan', 'LOCATION'], $output[15], "Did you set JAVA_HOME env variable?");
 }

예제 #4

파일 보기

파일: StopwordGenerator.php 프로젝트: yooper/php-text-analysis

 /**
  * Returns an array of stop words and their frequencies
  * @return string[]
  */
 public function getStopwords()
 {
     if (!empty($this->stopWords)) {
         return $this->stopWords;
     }
     foreach ($this->getFilePaths() as $filePath) {
         $content = $this->getFileContent($filePath);
         $doc = new TokensDocument((new GeneralTokenizer())->tokenize($content));
         $doc->applyTransformation(new LowerCaseFilter())->applyTransformation(new PossessiveNounFilter())->applyTransformation(new PunctuationFilter())->applyTransformation(new CharFilter());
         if ($this->mode === self::MODE_FREQ) {
             $this->computeUsingFreqDist($doc->getDocumentData());
         }
     }
     arsort($this->stopWords);
     return $this->stopWords;
 }