PHP t3lib_cs::utf8_substr Beispiele

Programmiersprache: PHP

Klasse / Typ: t3lib_cs

Methode / Funktion: utf8_substr

Beispiele auf hotexamples.com: 1

PHP t3lib_cs::utf8_substr - 1 Beispiele gefunden. Dies sind die am besten bewerteten PHP Beispiele für die t3lib_cs::utf8_substr, die aus Open Source-Projekten extrahiert wurden. Sie können Beispiele bewerten, um die Qualität der Beispiele zu verbessern.

Häufig verwendete Methoden

Anzeigen Verbergen

conv(5)

utf8_encode(4)

utf8_to_numberarray(2)

substr(2)

utf8_strlen(2)

get_locale_charset(2)

parse_charset(2)

strtrunc(2)

utf8_to_entities(1)

utf8_substr(1)

utf8CharToUnumber(1)

UnumberToChar(1)

initCharset(1)

entities_to_utf8(1)

crop(1)

conv_case(1)

convArray(1)

specCharsToASCII(1)

Beispiel #1

Datei anzeigen

Datei: class.lexer.php Projekt: zsolt-molnar/TYPO3-4.5-trunk

 /**
  * Add word to word-array
  * This function should be used to make sure CJK sequences are split up in the right way
  *
  * @param	array		Array of accumulated words
  * @param	string		Complete Input string from where to extract word
  * @param	integer		Start position of word in input string
  * @param	integer		The Length of the word string from start position
  * @return	void
  */
 function addWords(&$words, &$wordString, $start, $len)
 {
     // Get word out of string:
     $theWord = substr($wordString, $start, $len);
     // Get next chars unicode number and find type:
     $bc = 0;
     $cp = $this->utf8_ord($theWord, $bc);
     list($cType) = $this->charType($cp);
     // If string is a CJK sequence we follow this algorithm:
     /*
     	DESCRIPTION OF (CJK) ALGORITHM
     
     	Continuous letters and numbers make up words. Spaces and symbols
     	separate letters and numbers into words. This is sufficient for
     	all western text.
     
     	CJK doesn't use spaces or separators to separate words, so the only
     	way to really find out what constitutes a word would be to have a
     	dictionary and advanced heuristics. Instead, we form pairs from
     	consecutive characters, in such a way that searches will find only
     	characters that appear more-or-less the right sequence. For example:
     
     		ABCDE => AB BC CD DE
     
     	This works okay since both the index and the search query is split
     	in the same manner, and since the set of characters is huge so the
     	extra matches are not significant.
     
     	(Hint taken from ZOPEs chinese user group)
     
     	[Kasper: As far as I can see this will only work well with or-searches!]
     */
     if ($cType == 'cjk') {
         // Find total string length:
         $strlen = $this->csObj->utf8_strlen($theWord);
         // Traverse string length and add words as pairs of two chars:
         for ($a = 0; $a < $strlen; $a++) {
             if ($strlen == 1 || $a < $strlen - 1) {
                 $words[] = $this->csObj->utf8_substr($theWord, $a, 2);
             }
         }
     } else {
         // Normal "single-byte" chars:
         // Remove chars:
         foreach ($this->lexerConf['removeChars'] as $skipJoin) {
             $theWord = str_replace($this->csObj->UnumberToChar($skipJoin), '', $theWord);
         }
         // Add word:
         $words[] = $theWord;
     }
 }