PHP TYPO3\CMS\Core\Charset CharsetConverter::utf8_strlenの例

プログラミング言語: PHP

名前空間/パッケージ名: TYPO3\CMS\Core\Charset

クラス/型: CharsetConverter

メソッド/関数: utf8_strlen

hotexamples.comのコード掲載数: 2

PHP TYPO3\CMS\Core\Charset CharsetConverter::utf8_strlen - 2件のコード例が見つかりました。すべてオープンソースプロジェクトから抽出されたPHPのTYPO3\CMS\Core\Charset\CharsetConverter::utf8_strlenの実例で、最も評価が高いものを厳選しています。コード例の評価を行っていただくことで、より質の高いコード例が表示されるようになります。

よく使われるメソッド

表示非表示

conv(9)

conv_case(6)

strlen(5)

utf8_to_numberarray(4)

substr(3)

convArray(3)

entities_to_utf8(3)

parse_charset(3)

utf8_encode(3)

utf8_to_entities(2)

utf8_substr(2)

utf8_strlen(2)

specCharsToASCII(2)

utf8CharToUnumber(2)

strtrunc(2)

initCharset(2)

get_locale_charset(2)

crop(2)

UnumberToChar(1)

コード例 #1

ファイルを表示

ファイル: Lexer.php プロジェクト: khanhdeux/typo3test

 /**
  * Add word to word-array
  * This function should be used to make sure CJK sequences are split up in the right way
  *
  * @param 	array		Array of accumulated words
  * @param 	string		Complete Input string from where to extract word
  * @param 	integer		Start position of word in input string
  * @param 	integer		The Length of the word string from start position
  * @return 	void
  * @todo Define visibility
  */
 public function addWords(&$words, &$wordString, $start, $len)
 {
     // Get word out of string:
     $theWord = substr($wordString, $start, $len);
     // Get next chars unicode number and find type:
     $bc = 0;
     $cp = $this->utf8_ord($theWord, $bc);
     list($cType) = $this->charType($cp);
     // If string is a CJK sequence we follow this algorithm:
     /*
     		DESCRIPTION OF (CJK) ALGORITHMContinuous letters and numbers make up words. Spaces and symbols
     		separate letters and numbers into words. This is sufficient for
     		all western text.CJK doesn't use spaces or separators to separate words, so the only
     		way to really find out what constitutes a word would be to have a
     		dictionary and advanced heuristics. Instead, we form pairs from
     		consecutive characters, in such a way that searches will find only
     		characters that appear more-or-less the right sequence. For example:ABCDE => AB BC CD DEThis works okay since both the index and the search query is split
     		in the same manner, and since the set of characters is huge so the
     		extra matches are not significant.(Hint taken from ZOPEs chinese user group)[Kasper: As far as I can see this will only work well with or-searches!]
     */
     if ($cType == 'cjk') {
         // Find total string length:
         $strlen = $this->csObj->utf8_strlen($theWord);
         // Traverse string length and add words as pairs of two chars:
         for ($a = 0; $a < $strlen; $a++) {
             if ($strlen == 1 || $a < $strlen - 1) {
                 $words[] = $this->csObj->utf8_substr($theWord, $a, 2);
             }
         }
     } else {
         // Normal "single-byte" chars:
         // Remove chars:
         foreach ($this->lexerConf['removeChars'] as $skipJoin) {
             $theWord = str_replace($this->csObj->UnumberToChar($skipJoin), '', $theWord);
         }
         // Add word:
         $words[] = $theWord;
     }
 }

コード例 #2

ファイルを表示

ファイル: CharsetConverterTest.php プロジェクト: plan2net/TYPO3.CMS

 /**
  * @test
  */
 public function utf8_strlenForNonEmptyAsciiOnlyStringReturnsNumberOfCharacters()
 {
     $this->assertEquals(10, $this->subject->utf8_strlen('good omens'));
 }