PHP TextAnalysis\Tokenizers\GeneralTokenizer 예제들

프로그래밍 언어: PHP

클래스/타입: TextAnalysis\Tokenizers\GeneralTokenizer

hotexamples.com에서의 예제들: 2

PHP TextAnalysis\Tokenizers\GeneralTokenizer - 2개의 예제가 발견되었습니다. 이것들은 오픈소스 프로젝트에서 추출된 PHP의 TextAnalysis\Tokenizers\GeneralTokenizer에 대한 실세계 최고 등급의 예제들입니다. 예제들을 평가하여 예제의 품질 향상에 도움을 줄 수 있습니다.

자주 사용되는 메소드들

보기 숨기기

tokenize(2)

예제 #1

파일 보기

파일: example_01_frequency_analysis.php 프로젝트: Laradev/php-text-analysis-examples

require_once 'utils/BarPageBuilder.php';
/**
 *  Get some text from the internet
 *  we will grab tom sawyer from the gutenberg project
 *  http://www.gutenberg.org/cache/epub/74/pg74.txt
 * 
 */
/**
 * @var string $book 
 */
$book = file_get_contents('data/books/pg74.txt');
/**
 *  Create a tokenizer object to parse the book into a set of tokens
 *  
 */
$tokenizer = new \TextAnalysis\Tokenizers\GeneralTokenizer();
/**
 * Get the set of tokens generated by the tokenize, see 
 *  
 */
$tokens = $tokenizer->tokenize($book);
$freqDist = new \TextAnalysis\Analysis\FreqDist($tokens);
/**
 * Get the top 10 most used words in Tom Sawyer 
 */
$top10 = array_splice($freqDist->getKeyValuesByFrequency(), 0, 10);
/** 
 * Use High Charts to visualize the data
 */
$pageBuilder = new BarPageBuilder($top10);
$html = $pageBuilder->getHtmlPage();

예제 #2

파일 보기

파일: example_02_document_collections.php 프로젝트: Laradev/php-text-analysis-examples

 * An example of creating a creating document collection 
 * Document Collections allow you to work with a group of documents easily
 */
require_once 'vendor/autoload.php';
//used to generate a chart from the output of PHP Text Analysis
require_once 'utils/BarPageBuilder.php';
/**
 * @var string $book 
 */
$tomSawyerBook = file_get_contents('data/books/pg74.txt');
$huckFinnBook = file_get_contents('data/books/pg76.txt');
/**
 *  Create a tokenizer object to parse the book into a set of tokens
 *  
 */
$tokenizer = new \TextAnalysis\Tokenizers\GeneralTokenizer();
/**
 * Get the set of tokens generated by the tokenize and
 * create a token document from the tokens
 *  
 */
$tomSawyerDocument = new \TextAnalysis\Documents\TokensDocument($tokenizer->tokenize($tomSawyerBook));
$huckFinnDocument = new \TextAnalysis\Documents\TokensDocument($tokenizer->tokenize($huckFinnBook));
/**
 * create a document collection that can have filters or further analysis done
 */
$docCollection = new \TextAnalysis\Collections\DocumentArrayCollection(array($tomSawyerDocument, $huckFinnDocument));
/**
 *  Apply filters to the document collection
 *  lower case the documents, remove quotes and remove stop words
 */