In _quick_ mode it will check only for non ASCII characters being used
indicating any multibyte encoding. Don't use quick mode for integrity
validation of UTF-8 encoded strings.
Meaning of RegExp:
'[\x09\x0A\x0D\x20-\x7E]'; // ASCII
'|[\xC2-\xDF][\x80-\xBF]'; // non-overlong 2-byte
'|\xE0[\xA0-\xBF][\x80-\xBF]'; // excluding overlongs
'|[\xE1-\xEC\xEE\xEF][\x80-\xBF]{2}'; // straight 3-byte
'|\xED[\x80-\x9F][\x80-\xBF]'; // excluding surrogates
'|\xF0[\x90-\xBF][\x80-\xBF]{2}'; // planes 1-3
'|[\xF1-\xF3][\x80-\xBF]{3}'; // planes 4-15
'|\xF4[\x80-\x8F][\x80-\xBF]{2}'; // plane 16
/** * Verifies the behavior of `Multibyte::is()` when dealing with valid, * invalid UTF-8 strings as well as edge cases. Please see the docblock for * `testIsBehaviroral` for more contextual information on the type of test * and data used here. * * This test clearly shows and accepts the limitations in which the `quick` * mode operates. The `quick` mode will obviously never get as good results * as the normal one. * * These items should be detected as *invalid* UTF-8 (but currently aren't): * - lines 101-263 in nearly all remaining sections. * * @see lithium\tests\cases\g11n\MultibyteTest::testIsBehavioral() */ public function testIsQuickBehavioral() { $path = LITHIUM_LIBRARY_PATH . '/lithium/tests/resources/utf8_decoder_stress_test.txt'; $data = file($path); $items = array(64 => true, 70 => true, 71 => true, 72 => true, 73 => true, 74 => true, 75 => true, 79 => true, 80 => true, 81 => true, 82 => true, 83 => true, 84 => true, 101 => true, 102 => true, 104 => true, 105 => true, 106 => true, 107 => true, 108 => true, 109 => true, 113 => true, 114 => true, 115 => true, 116 => true, 123 => true, 124 => true, 129 => true, 134 => true, 139 => true, 144 => true, 154 => true, 155 => true, 156 => true, 157 => true, 158 => true, 159 => true, 160 => true, 161 => true, 168 => true, 174 => true, 176 => true, 206 => true, 207 => true, 208 => true, 209 => true, 210 => true, 219 => true, 220 => true, 221 => true, 222 => true, 223 => true, 231 => true, 232 => true, 233 => true, 234 => true, 235 => true, 246 => true, 247 => true, 248 => true, 249 => true, 250 => true, 251 => true, 252 => true, 256 => true, 257 => true, 258 => true, 259 => true, 260 => true, 261 => true, 262 => true, 263 => true, 267 => true, 268 => true); foreach ($items as $number => $expected) { $result = Multibyte::is($data[$number], array('quick' => true)); $message = "Expected item on line {$number} to be detected as "; $message .= ($expected ? 'valid' : 'invalid') . " UTF-8.\n"; $this->assertEqual($expected, $result, $message); } }