ASCII
🇷🇺 Русским гражданам
В Украине сейчас идет война. Силами РФ наносятся удары по гражданской инфраструктуре в [Харькове][1], [Киеве][2], [Чернигове][3], [Сумах][4], [Ирпене][5] и десятках других городов. Гибнут люди - и гражданское население, и военные, в том числе российские призывники, которых бросили воевать. Чтобы лишить собственный народ доступа к информации, правительство РФ запретило называть войну войной, закрыло независимые СМИ и принимает сейчас ряд диктаторских законов. Эти законы призваны заткнуть рот всем, кто против войны. За обычный призыв к миру сейчас можно получить несколько лет тюрьмы.
Не молчите! Молчание - знак вашего согласия с политикой российского правительства. Вы можете сделать выбор НЕ МОЛЧАТЬ.
🇺🇸 To people of Russia
There is a war in Ukraine right now. The forces of the Russian Federation are attacking civilian infrastructure in [Kharkiv][1], [Kyiv][2], [Chernihiv][3], [Sumy][4], [Irpin][5] and dozens of other cities. People are dying – both civilians and military servicemen, including Russian conscripts who were thrown into the fighting. In order to deprive its own people of access to information, the government of the Russian Federation has forbidden calling a war a war, shut down independent media and is passing a number of dictatorial laws. These laws are meant to silence all those who are against war. You can be jailed for multiple years for simply calling for peace. Do not be silent! Silence is a sign that you accept the Russian government's policy. You can choose NOT TO BE SILENT.
- [1] https://cloudfront-us-east-2.images.arcpublishing.com/reuters/P7K2MSZDGFMIJPDD7CI2GIROJI.jpg "Kharkiv under attack"
- [2] https://gdb.voanews.com/01bd0000-0aff-0242-fad0-08d9fc92c5b3_cx0_cy5_cw0_w1023_r1_s.jpg "Kyiv under attack"
- [3] https://ichef.bbci.co.uk/news/976/cpsprodpb/163DD/production/_123510119_hi074310744.jpg "Chernihiv under attack"
- [4] https://www.youtube.com/watch?v=8K-bkqKKf2A "Sumy under attack"
-
[5] https://cloudfront-us-east-2.images.arcpublishing.com/reuters/K4MTMLEHTRKGFK3GSKAT4GR3NE.jpg "Irpin under attack"
-
Full name:
\voku\helper\ASCII
- This class is marked as final and can't be subclassed
- This class is a Final class
Constants
Constant | Visibility | Type | Value |
---|---|---|---|
UZBEK_LANGUAGE_CODE |
public | 'uz' | |
TURKMEN_LANGUAGE_CODE |
public | 'tk' | |
THAI_LANGUAGE_CODE |
public | 'th' | |
PASHTO_LANGUAGE_CODE |
public | 'ps' | |
ORIYA_LANGUAGE_CODE |
public | 'or' | |
MONGOLIAN_LANGUAGE_CODE |
public | 'mn' | |
KOREAN_LANGUAGE_CODE |
public | 'ko' | |
KIRGHIZ_LANGUAGE_CODE |
public | 'ky' | |
ARMENIAN_LANGUAGE_CODE |
public | 'hy' | |
BENGALI_LANGUAGE_CODE |
public | 'bn' | |
BELARUSIAN_LANGUAGE_CODE |
public | 'be' | |
AMHARIC_LANGUAGE_CODE |
public | 'am' | |
JAPANESE_LANGUAGE_CODE |
public | 'ja' | |
CHINESE_LANGUAGE_CODE |
public | 'zh' | |
DUTCH_LANGUAGE_CODE |
public | 'nl' | |
ITALIAN_LANGUAGE_CODE |
public | 'it' | |
MACEDONIAN_LANGUAGE_CODE |
public | 'mk' | |
PORTUGUESE_LANGUAGE_CODE |
public | 'pt' | |
GREEKLISH_LANGUAGE_CODE |
public | 'el__greeklish' | |
GREEK_LANGUAGE_CODE |
public | 'el' | |
HINDI_LANGUAGE_CODE |
public | 'hi' | |
SWEDISH_LANGUAGE_CODE |
public | 'sv' | |
TURKISH_LANGUAGE_CODE |
public | 'tr' | |
BULGARIAN_LANGUAGE_CODE |
public | 'bg' | |
HUNGARIAN_LANGUAGE_CODE |
public | 'hu' | |
MYANMAR_LANGUAGE_CODE |
public | 'my' | |
CROATIAN_LANGUAGE_CODE |
public | 'hr' | |
FINNISH_LANGUAGE_CODE |
public | 'fi' | |
GEORGIAN_LANGUAGE_CODE |
public | 'ka' | |
RUSSIAN_LANGUAGE_CODE |
public | 'ru' | |
RUSSIAN_PASSPORT_2013_LANGUAGE_CODE |
public | 'ru__passport_2013' | |
RUSSIAN_GOST_2000_B_LANGUAGE_CODE |
public | 'ru__gost_2000_b' | |
UKRAINIAN_LANGUAGE_CODE |
public | 'uk' | |
KAZAKH_LANGUAGE_CODE |
public | 'kk' | |
CZECH_LANGUAGE_CODE |
public | 'cs' | |
DANISH_LANGUAGE_CODE |
public | 'da' | |
POLISH_LANGUAGE_CODE |
public | 'pl' | |
ROMANIAN_LANGUAGE_CODE |
public | 'ro' | |
ESPERANTO_LANGUAGE_CODE |
public | 'eo' | |
ESTONIAN_LANGUAGE_CODE |
public | 'et' | |
LATVIAN_LANGUAGE_CODE |
public | 'lv' | |
LITHUANIAN_LANGUAGE_CODE |
public | 'lt' | |
NORWEGIAN_LANGUAGE_CODE |
public | 'no' | |
VIETNAMESE_LANGUAGE_CODE |
public | 'vi' | |
ARABIC_LANGUAGE_CODE |
public | 'ar' | |
PERSIAN_LANGUAGE_CODE |
public | 'fa' | |
SERBIAN_LANGUAGE_CODE |
public | 'sr' | |
SERBIAN_CYRILLIC_LANGUAGE_CODE |
public | 'sr__cyr' | |
SERBIAN_LATIN_LANGUAGE_CODE |
public | 'sr__lat' | |
AZERBAIJANI_LANGUAGE_CODE |
public | 'az' | |
SLOVAK_LANGUAGE_CODE |
public | 'sk' | |
FRENCH_LANGUAGE_CODE |
public | 'fr' | |
FRENCH_AUSTRIAN_LANGUAGE_CODE |
public | 'fr_at' | |
FRENCH_SWITZERLAND_LANGUAGE_CODE |
public | 'fr_ch' | |
GERMAN_LANGUAGE_CODE |
public | 'de' | |
GERMAN_AUSTRIAN_LANGUAGE_CODE |
public | 'de_at' | |
GERMAN_SWITZERLAND_LANGUAGE_CODE |
public | 'de_ch' | |
ENGLISH_LANGUAGE_CODE |
public | 'en' | |
EXTRA_LATIN_CHARS_LANGUAGE_CODE |
public | 'latin' | |
EXTRA_WHITESPACE_CHARS_LANGUAGE_CODE |
public | ' ' | |
EXTRA_MSWORD_CHARS_LANGUAGE_CODE |
public | 'msword' |
Properties
ASCII_MAPS
- This property is static.
ASCII_MAPS_AND_EXTRAS
- This property is static.
ASCII_EXTRAS
- This property is static.
ORD
- This property is static.
LANGUAGE_MAX_KEY
- This property is static.
REGEX_ASCII
url: https://en.wikipedia.org/wiki/Wikipedia:ASCII#ASCII_printable_characters
- This property is static.
BIDI_UNI_CODE_CONTROLS_TABLE
bidirectional text chars
url: https://www.w3.org/International/questions/qa-bidi-unicode-controls
- This property is static.
Methods
getAllLanguages
Get all languages from the constants "ASCII::.*LANGUAGE_CODE".
- This method is static.
charsArray
Returns an replacement array for ASCII methods.
EXAMPLE:
$array = ASCII::charsArray();
var_dump($array['ru']['б']); // 'b'
- This method is static.
Parameters:
Parameter | Type | Description |
---|---|---|
$replace_extra_symbols |
bool | [optional] <p>Add some more replacements e.g. "£" with " pound ".</p> |
charsArrayWithMultiLanguageValues
Returns an replacement array for ASCII methods with a mix of multiple languages.
EXAMPLE:
$array = ASCII::charsArrayWithMultiLanguageValues();
var_dump($array['b']); // ['β', 'б', 'ဗ', 'ბ', 'ب']
- This method is static.
Parameters:
Parameter | Type | Description |
---|---|---|
$replace_extra_symbols |
bool | [optional] <p>Add some more replacements e.g. "£" with " pound ".</p> |
Return Value:
An array of replacements.
charsArrayWithOneLanguage
Returns an replacement array for ASCII methods with one language.
public static charsArrayWithOneLanguage(string $language = self::ENGLISH_LANGUAGE_CODE, bool $replace_extra_symbols = false, bool $asOrigReplaceArray = true): array
For example, German will map 'ä' to 'ae', while other languages will simply return e.g. 'a'.
EXAMPLE:
$array = ASCII::charsArrayWithOneLanguage('ru');
$tmpKey = \array_search('yo', $array['replace']);
echo $array['orig'][$tmpKey]; // 'ё'
- This method is static.
Parameters:
Parameter | Type | Description |
---|---|---|
$language |
string | [optional] <p>Language of the source string e.g.: en, de_at, or de-ch. (default is 'en') | ASCII::*_LANGUAGE_CODE</p> |
$replace_extra_symbols |
bool | [optional] <p>Add some more replacements e.g. "£" with " pound ".</p> |
$asOrigReplaceArray |
bool | [optional] <p>TRUE === return {orig: string[], replace: string[]} array</p> |
Return Value:
An array of replacements.
charsArrayWithSingleLanguageValues
Returns an replacement array for ASCII methods with multiple languages.
public static charsArrayWithSingleLanguageValues(bool $replace_extra_symbols = false, bool $asOrigReplaceArray = true): array
EXAMPLE:
$array = ASCII::charsArrayWithSingleLanguageValues();
$tmpKey = \array_search('hnaik', $array['replace']);
echo $array['orig'][$tmpKey]; // '၌'
- This method is static.
Parameters:
Parameter | Type | Description |
---|---|---|
$replace_extra_symbols |
bool | [optional] <p>Add some more replacements e.g. "£" with " pound ".</p> |
$asOrigReplaceArray |
bool | [optional] <p>TRUE === return {orig: string[], replace: string[]} array</p> |
Return Value:
An array of replacements.
clean
Accepts a string and removes all non-UTF-8 characters from it + extras if needed.
public static clean(string $str, bool $normalize_whitespace = true, bool $keep_non_breaking_space = false, bool $normalize_msword = true, bool $remove_invisible_characters = true): string
- This method is static.
Parameters:
Parameter | Type | Description |
---|---|---|
$str |
string | <p>The string to be sanitized.</p> |
$normalize_whitespace |
bool | [optional] <p>Set to true, if you need to normalize the whitespace.</p> |
$keep_non_breaking_space |
bool | [optional] <p>Set to true, to keep non-breaking-spaces, in combination with $normalize_whitespace</p> |
$normalize_msword |
bool | [optional] <p>Set to true, if you need to normalize MS Word chars e.g.: "…" => "..."</p> |
$remove_invisible_characters |
bool | [optional] <p>Set to false, if you not want to remove invisible characters e.g.: "\0"</p> |
Return Value:
A clean UTF-8 string.
is_ascii
Checks if a string is 7 bit ASCII.
EXAMPLE:
ASCII::is_ascii('白'); // false
- This method is static.
Parameters:
Parameter | Type | Description |
---|---|---|
$str |
string | <p>The string to check.</p> |
Return Value:
true if it is ASCII
false otherwise
normalize_msword
Returns a string with smart quotes, ellipsis characters, and dashes from Windows-1252 (commonly used in Word documents) replaced by their ASCII equivalents.
EXAMPLE:
ASCII::normalize_msword('„Abcdef…”'); // '"Abcdef..."'
- This method is static.
Parameters:
Parameter | Type | Description |
---|---|---|
$str |
string | <p>The string to be normalized.</p> |
Return Value:
A string with normalized characters for commonly used chars in Word documents.
normalize_whitespace
Normalize the whitespace.
public static normalize_whitespace(string $str, bool $keepNonBreakingSpace = false, bool $keepBidiUnicodeControls = false, bool $normalize_control_characters = false): string
EXAMPLE:
ASCII::normalize_whitespace("abc-\xc2\xa0-öäü-\xe2\x80\xaf-\xE2\x80\xAC", true); // "abc-\xc2\xa0-öäü- -"
- This method is static.
Parameters:
Parameter | Type | Description |
---|---|---|
$str |
string | <p>The string to be normalized.</p> |
$keepNonBreakingSpace |
bool | [optional] <p>Set to true, to keep non-breaking-spaces.</p> |
$keepBidiUnicodeControls |
bool | [optional] <p>Set to true, to keep non-printable (for the web) bidirectional text chars.</p> |
$normalize_control_characters |
bool | [optional] <p>Set to true, to convert e.g. LINE-, PARAGRAPH-SEPARATOR with "\n" and LINE TABULATION with "\t".</p> |
Return Value:
A string with normalized whitespace.
remove_invisible_characters
Remove invisible characters from a string.
public static remove_invisible_characters(string $str, bool $url_encoded = false, string $replacement = '', bool $keep_basic_control_characters = true): string
e.g.: This prevents sandwiching null characters between ascii characters, like Java\0script.
copy&past from https://github.com/bcit-ci/CodeIgniter/blob/develop/system/core/Common.php
- This method is static.
Parameters:
Parameter | Type | Description |
---|---|---|
$str |
string | |
$url_encoded |
bool | |
$replacement |
string | |
$keep_basic_control_characters |
bool |
to_ascii_remap
WARNING: This method will return broken characters and is only for special cases.
Convert two UTF-8 encoded string to a single-byte strings suitable for functions that need the same string length after the conversion.
The function simply uses (and updates) a tailored dynamic encoding (in/out map parameter) where non-ascii characters are remapped to the range [128-255] in order of appearance.
- This method is static.
Parameters:
Parameter | Type | Description |
---|---|---|
$str1 |
string | |
$str2 |
string |
to_ascii
Returns an ASCII version of the string. A set of non-ASCII characters are replaced with their closest ASCII counterparts, and the rest are removed by default. The language or locale of the source string can be supplied for language-specific transliteration in any of the following formats: en, en_GB, or en-GB. For example, passing "de" results in "äöü" mapping to "aeoeue" rather than "aou" as in other languages.
public static to_ascii(string $str, string $language = self::ENGLISH_LANGUAGE_CODE, bool $remove_unsupported_chars = true, bool $replace_extra_symbols = false, bool $use_transliterate = false, bool|null $replace_single_chars_only = null): string
EXAMPLE:
ASCII::to_ascii('�Düsseldorf�', 'en'); // Dusseldorf
- This method is static.
Parameters:
Parameter | Type | Description |
---|---|---|
$str |
string | <p>The input string.</p> |
$language |
string | [optional] <p>Language of the source string. (default is 'en') | ASCII::*_LANGUAGE_CODE</p> |
$remove_unsupported_chars |
bool | [optional] <p>Whether or not to remove the unsupported characters.</p> |
$replace_extra_symbols |
bool | [optional] <p>Add some more replacements e.g. "£" with " pound ".</p> |
$use_transliterate |
bool | [optional] <p>Use ASCII::to_transliterate() for unknown chars.</p> |
$replace_single_chars_only |
bool|null | [optional] <p>Single char replacement is better for the performance, but some languages need to replace more then one char at the same time. | NULL === auto-setting, depended on the language</p> |
Return Value:
A string that contains only ASCII characters.
to_filename
Convert given string to safe filename (and keep string case).
public static to_filename(string $str, bool $use_transliterate = true, string $fallback_char = '-'): string
EXAMPLE:
ASCII::to_filename('שדגשדג.png', true)); // 'shdgshdg.png'
- This method is static.
Parameters:
Parameter | Type | Description |
---|---|---|
$str |
string | |
$use_transliterate |
bool | <p>ASCII::to_transliterate() is used by default - unsafe characters are simply replaced with hyphen otherwise.</p> |
$fallback_char |
string |
Return Value:
A string that contains only safe characters for a filename.
to_slugify
Converts the string into an URL slug. This includes replacing non-ASCII characters with their closest ASCII equivalents, removing remaining non-ASCII and non-alphanumeric characters, and replacing whitespace with $separator. The separator defaults to a single dash, and the string is also converted to lowercase. The language of the source string can also be supplied for language-specific transliteration.
public static to_slugify(string $str, string $separator = '-', string $language = self::ENGLISH_LANGUAGE_CODE, array<string,string> $replacements = [], bool $replace_extra_symbols = false, bool $use_str_to_lower = true, bool $use_transliterate = false): string
- This method is static.
Parameters:
Parameter | Type | Description |
---|---|---|
$str |
string | |
$separator |
string | [optional] <p>The string used to replace whitespace.</p> |
$language |
string | [optional] <p>Language of the source string. (default is 'en') | ASCII::*_LANGUAGE_CODE</p> |
$replacements |
array |
[optional] <p>A map of replaceable strings.</p> |
$replace_extra_symbols |
bool | [optional] <p>Add some more replacements e.g. "£" with " pound ".</p> |
$use_str_to_lower |
bool | [optional] <p>Use "string to lower" for the input.</p> |
$use_transliterate |
bool | [optional] <p>Use ASCII::to_transliterate() for unknown chars.</p> |
Return Value:
A string that has been converted to an URL slug.
to_transliterate
Returns an ASCII version of the string. A set of non-ASCII characters are replaced with their closest ASCII counterparts, and the rest are removed unless instructed otherwise.
public static to_transliterate(string $str, string|null $unknown = '?', bool $strict = false): string
EXAMPLE:
ASCII::to_transliterate('déjà σσς iıii'); // 'deja sss iiii'
- This method is static.
Parameters:
Parameter | Type | Description |
---|---|---|
$str |
string | <p>The input string.</p> |
$unknown |
string|null | [optional] <p>Character use if character unknown. (default is '?') But you can also use NULL to keep the unknown chars.</p> |
$strict |
bool | [optional] <p>Use "transliterator_transliterate()" from PHP-Intl |
Return Value:
A String that contains only ASCII characters.
to_ascii_remap_intern
WARNING: This method will return broken characters and is only for special cases.
Convert a UTF-8 encoded string to a single-byte string suitable for functions that need the same string length after the conversion.
The function simply uses (and updates) a tailored dynamic encoding (in/out map parameter) where non-ascii characters are remapped to the range [128-255] in order of appearance.
Thus, it supports up to 128 different multibyte code points max over the whole set of strings sharing this encoding.
Source: https://github.com/KEINOS/mb_levenshtein
- This method is static.
Parameters:
Parameter | Type | Description |
---|---|---|
$str |
string | <p>UTF-8 string to be converted to extended ASCII.</p> |
$map |
array | <p>Internal-Map of code points to ASCII characters.</p> |
Return Value:
Mapped borken string.
get_language
Get the language from a string.
e.g.: de_at -> de_at de_DE -> de DE_DE -> de de-de -> de
- This method is static.
Parameters:
Parameter | Type | Description |
---|---|---|
$language |
string |
getData
Get data from "/data/*.php".
- This method is static.
Parameters:
Parameter | Type | Description |
---|---|---|
$file |
string |
getDataIfExists
Get data from "/data/*.php".
- This method is static.
Parameters:
Parameter | Type | Description |
---|---|---|
$file |
string |
prepareAsciiAndExtrasMaps
- This method is static.
prepareAsciiMaps
- This method is static.
prepareAsciiExtras
- This method is static.
Automatically generated on 2025-03-18