UTF : An Unicode string manager
The UTF class is an utility class that eases the handling of Unicode strings.
Namespace: \
File location: lib/utf.php
Instantiation
Return class instance
$utf = \UTF::instance();
The UTF class uses the Prefab factory wrapper, so you can grab the same instance of that class at any point of your code.
Methods
Similarly to the standard PHP strings methods, all the methods of the UTF class return zero-based offsets.
strlen
Get string length
int strlen ( string $str )
This function returns the length of a given string
Example:
$utf->strlen('나는 유리를 먹을 수 있어요. 그래도'); // returns 20 (while php strlen returns 48)
stripos
Find position of first occurrence of a string, case-insensitive
int|FALSE stripos ( string $stack, string $needle [, int $ofs = 0 ] )
This function returns the position of the first occurrence of the $needle
string in the $stack
string. Case-insensitive search. Returns FALSE
if $needle
not found.
If $ofs
is specified, search will start this number of characters counted from the beginning of the string. The $ofs
offset cannot be negative.
Examples:
$utf->stripos('Les Naïfs ægithales hâtifs', 'naïfs'); // returns 4
$utf->stripos('Les Naïfs ægithales hâtifs', 'NAÏFS'); // returns 4
$utf->stripos('Les Naïfs ægithales hâtifs', 'NAÏFS', 10); // returns FALSE
strpos
Find position of first occurrence of a string, case-sensitive
int|FALSE strpos ( string $stack, string $needle [, int $ofs = 0 [, bool $case = FALSE ]] )
This function returns the position of the first occurrence of the $needle
string in the $stack
string. Returns FALSE
if $needle
not found.
If $ofs
is specified, search will start this number of characters counted from the beginning of the string. The $ofs
offset cannot be negative.
If $case
is set to TRUE
, the search is case-insensitive and the function behaves like stripos()
.
Examples:
$utf->strpos('Góa ē-tàng Chia̍h Po-lê', 'Góa' ); // returns 0
$utf->strpos('Góa ē-tàng Chia̍h Po-lê', 'Góa', 4 ); // returns FALSE
$utf->strpos('Góa ē-tàng Chia̍h Po-lê', 'chia̍h', 0 ); // returns FALSE (case-sensitive)
$utf->strpos('Góa ē-tàng Chia̍h Po-lê', 'chia̍h', 0, TRUE ); // returns 11 (case-insensitive)
stristr
Returns part of haystack string from the first occurrence of the needle to the end of haystack, case-insensitive
string|FALSE stristr ( string $stack, string $needle [, bool $before = FALSE ] )
This function returns part of $stack
string starting from and including the first occurrence of $needle
to the end of $stack
. Case-insensitive. Returns FALSE
if $needle
not found.
If $before
is set to TRUE
, stristr() returns the part of the $stack
before the first occurrence of the $needle
(excluding the needle).
Examples:
$utf->stristr('Mayia Góa Chàyia̍h Lêh-Pok', 'CHÀYIA̍H' ); // returns 'Chàyia̍h Lêh-Pok'
$utf->stristr('Mayia Góa Chàyia̍h Lêh-Pok', 'GóA', TRUE ); // returns 'Mayia '
strstr
Returns part of haystack string from the first occurrence of the needle to the end of the haystack
string|FALSE strstr ( string $stack, string $needle [, bool $before = FALSE [, bool $case = FALSE ]] )
This function returns part of $stack
string starting from and including the first occurrence of $needle
to the end of $stack
. Returns FALSE
if $needle
not found.
If $before
is set to TRUE
, strstr() returns the part of the $stack
before the first occurrence of the $needle
(excluding the needle).
If $case
is set to TRUE
, the search is case-insensitive and the function behaves like stristr()
.
Example:
$email = 'Mïchañ[email protected]';
$domain = $utf->strstr($email, '@'); // returns '@example.com'
$user = $utf->strstr($email, '@', TRUE); // returns 'Mïchaño'
substr
Return part of a string
string|FALSE substr ( string $str, int $start [, int $length = 0 ] )
This function returns the portion of string $str
specified by the $start
and $length
parameters. If $length
is omitted, the substring starting from $start
until the end of the string will be returned.
If $start
is negative, the returned string will $start
at the start'th character from the end of string $str
.
If string $str
is less than or equal to $start
characters long, FALSE
will be returned.
Examples:
$utf->substr('El pingüino Wenceslao hizo kilómetros', 3,8); // returns 'pingüino'
$utf->substr('El pingüino Wenceslao hizo kilómetros',-10,4); // returns 'kiló'
substr_count
Count the number of occurrences of a substring
int substr_count ( string $stack, string $needle )
This function counts and returns the number of times the $needle
substring occurs in the $stack
string. Note that $needle
is case sensitive.
Examples:
$utf->substr_count('This is an example as it is', 'is'); // returns 3
$utf->substr_count(implode(array('This','example','as','it')), 'is'); // returns 1 ! PHP BUG !
$arr = array('This','example','as','it');
$utf->substr_count(implode($arr,'is'), 'is'); // returns 4
ltrim
Strip whitespaces from the beginning of a string
string ltrim ( string $str )
This function strips whitespaces and other characters (according to the regexp /[\pZ\pC]+/u
) from the beginning of a given string.
Examples:
$utf->ltrim("\xe2\x80\x83\x20 WhatAMana!\xc2\xa0\xe1\x9a\x80"); // returns "WhatAMana!\xc2\xa0\xe1\x9a\x80"
$utf->ltrim(' invisible leading spaces... '); // returns 'invisible leading spaces... '
rtrim
Strip whitespaces from the end of a string
string rtrim ( string $str )
This function strips whitespaces and other characters (according to the regexp /[\pZ\pC]+$/u
) from the end of a given string.
Examples:
$utf->rtrim("\xe2\x80\x83\x20 WhatAMana! \xc2\xa0\xe1\x9a\x80"); // returns "\xe2\x80\x83\x20 WhatAMana!"
$utf->rtrim(' invisible trailing spaces... '); // returns ' invisible trailing spaces...'
trim
Strip whitespaces from the beginning and end of a string
string trim ( string $str )
This function strips whitespaces and other characters (according to the regexp /^[\pZ\pC]+|[\pZ\pC]+$/u
) from the beginning and end of a given string.
Examples:
$utf->trim("\xe2\x80\x83\x20 WhatAMana! \xc2\xa0\xe1\x9a\x80"); // returns "WhatAMana!"
$utf->trim(' invisible spaces... '); // returns 'invisible spaces...'
bom
Return UTF-8 byte order mark (BOM)
string bom ( )
Return the byte order mark (BOM) Unicode character used to signal the byte order of a text file or a stream. The BOM character may also indicate which of the several Unicode representations the text is encoded in. BOM use is optional, and, if used, must appear at the start of the text stream.
Example:
$bom = \UTF::instance()->bom(); // $bom = 0xefbbbf
echo '0x'.dechex(ord($bom[0])).dechex(ord($bom[1])).dechex(ord($bom[2])); // displays '0xefbbbf'
// convert/save a file with a BOM at its beginning
$f3->write( $filename, $bom . $f3->read($filename) );
translate
Convert code points to Unicode symbols
string translate ( string $str )
Converts and returns code points (e.g. U+0E8D U+053D) to the equivalent Unicode symbols (e.g. ຍ Խ)
emojify
Translate emoji tokens to Unicode font-supported symbols
string emojify ( string $str )
Converts and returns the Unicode font-supported symbols equivalent of emoji tokens.
emoji tokens translated by default are:
':(' => '\u2639', // frown
':)' => '\u263a', // smile
'<3' => '\u2665', // heart
':D' => '\u1f603', // grin
'XD' => '\u1f606', // laugh
';)' => '\u1f609', // wink
':P' => '\u1f60b', // tongue
':,' => '\u1f60f', // think
':/' => '\u1f623', // skeptic
'8O' => '\u1f632', // oops
Example:
echo \UTF::instance()->emojify('Thanks :) I <3'); // displays 'Thanks ☺ I ♥'
You can specify your own additional emoji tokens with the EMOJI system variable. When provided, those emoji tokens are added to the basic set above and will be used when translating a string to Unicode font-supported symbols.
Examples:
$f3->set('EMOJI', array('(c)' => '©', '?' => '¿') );
echo \UTF::instance()->emojify( 'Do you like (c)opyrights ??'); // displays 'Do you like ©opyrights ¿¿'
//
$f3->set('EMOJI', array('@om' => '\U0F00', '&oooooom' => '\U0F02', '%om' => '\U0F00') );
echo \UTF::instance()->emojify( '@om Greets &oooooom from Tibet %om'); // displays 'ༀ Greets ༂ from Tibet ༀ'
As you can see, it's up to you to define your emoji tokens as you fancy. You can even imagine to automate the call to the emojify
function for the variables you use in your templates.