Skip to content

ASCII

🇷🇺 Русским гражданам

В Украине сейчас идет война. Силами РФ наносятся удары по гражданской инфраструктуре в [Харькове][1], [Киеве][2], [Чернигове][3], [Сумах][4], [Ирпене][5] и десятках других городов. Гибнут люди - и гражданское население, и военные, в том числе российские призывники, которых бросили воевать. Чтобы лишить собственный народ доступа к информации, правительство РФ запретило называть войну войной, закрыло независимые СМИ и принимает сейчас ряд диктаторских законов. Эти законы призваны заткнуть рот всем, кто против войны. За обычный призыв к миру сейчас можно получить несколько лет тюрьмы.

Не молчите! Молчание - знак вашего согласия с политикой российского правительства. Вы можете сделать выбор НЕ МОЛЧАТЬ.


🇺🇸 To people of Russia

There is a war in Ukraine right now. The forces of the Russian Federation are attacking civilian infrastructure in [Kharkiv][1], [Kyiv][2], [Chernihiv][3], [Sumy][4], [Irpin][5] and dozens of other cities. People are dying – both civilians and military servicemen, including Russian conscripts who were thrown into the fighting. In order to deprive its own people of access to information, the government of the Russian Federation has forbidden calling a war a war, shut down independent media and is passing a number of dictatorial laws. These laws are meant to silence all those who are against war. You can be jailed for multiple years for simply calling for peace. Do not be silent! Silence is a sign that you accept the Russian government's policy. You can choose NOT TO BE SILENT.


  • [1] https://cloudfront-us-east-2.images.arcpublishing.com/reuters/P7K2MSZDGFMIJPDD7CI2GIROJI.jpg "Kharkiv under attack"
  • [2] https://gdb.voanews.com/01bd0000-0aff-0242-fad0-08d9fc92c5b3_cx0_cy5_cw0_w1023_r1_s.jpg "Kyiv under attack"
  • [3] https://ichef.bbci.co.uk/news/976/cpsprodpb/163DD/production/_123510119_hi074310744.jpg "Chernihiv under attack"
  • [4] https://www.youtube.com/watch?v=8K-bkqKKf2A "Sumy under attack"
  • [5] https://cloudfront-us-east-2.images.arcpublishing.com/reuters/K4MTMLEHTRKGFK3GSKAT4GR3NE.jpg "Irpin under attack"

  • Full name: \voku\helper\ASCII

  • This class is marked as final and can't be subclassed
  • This class is a Final class

Constants

Constant Visibility Type Value
UZBEK_LANGUAGE_CODE public 'uz'
TURKMEN_LANGUAGE_CODE public 'tk'
THAI_LANGUAGE_CODE public 'th'
PASHTO_LANGUAGE_CODE public 'ps'
ORIYA_LANGUAGE_CODE public 'or'
MONGOLIAN_LANGUAGE_CODE public 'mn'
KOREAN_LANGUAGE_CODE public 'ko'
KIRGHIZ_LANGUAGE_CODE public 'ky'
ARMENIAN_LANGUAGE_CODE public 'hy'
BENGALI_LANGUAGE_CODE public 'bn'
BELARUSIAN_LANGUAGE_CODE public 'be'
AMHARIC_LANGUAGE_CODE public 'am'
JAPANESE_LANGUAGE_CODE public 'ja'
CHINESE_LANGUAGE_CODE public 'zh'
DUTCH_LANGUAGE_CODE public 'nl'
ITALIAN_LANGUAGE_CODE public 'it'
MACEDONIAN_LANGUAGE_CODE public 'mk'
PORTUGUESE_LANGUAGE_CODE public 'pt'
GREEKLISH_LANGUAGE_CODE public 'el__greeklish'
GREEK_LANGUAGE_CODE public 'el'
HINDI_LANGUAGE_CODE public 'hi'
SWEDISH_LANGUAGE_CODE public 'sv'
TURKISH_LANGUAGE_CODE public 'tr'
BULGARIAN_LANGUAGE_CODE public 'bg'
HUNGARIAN_LANGUAGE_CODE public 'hu'
MYANMAR_LANGUAGE_CODE public 'my'
CROATIAN_LANGUAGE_CODE public 'hr'
FINNISH_LANGUAGE_CODE public 'fi'
GEORGIAN_LANGUAGE_CODE public 'ka'
RUSSIAN_LANGUAGE_CODE public 'ru'
RUSSIAN_PASSPORT_2013_LANGUAGE_CODE public 'ru__passport_2013'
RUSSIAN_GOST_2000_B_LANGUAGE_CODE public 'ru__gost_2000_b'
UKRAINIAN_LANGUAGE_CODE public 'uk'
KAZAKH_LANGUAGE_CODE public 'kk'
CZECH_LANGUAGE_CODE public 'cs'
DANISH_LANGUAGE_CODE public 'da'
POLISH_LANGUAGE_CODE public 'pl'
ROMANIAN_LANGUAGE_CODE public 'ro'
ESPERANTO_LANGUAGE_CODE public 'eo'
ESTONIAN_LANGUAGE_CODE public 'et'
LATVIAN_LANGUAGE_CODE public 'lv'
LITHUANIAN_LANGUAGE_CODE public 'lt'
NORWEGIAN_LANGUAGE_CODE public 'no'
VIETNAMESE_LANGUAGE_CODE public 'vi'
ARABIC_LANGUAGE_CODE public 'ar'
PERSIAN_LANGUAGE_CODE public 'fa'
SERBIAN_LANGUAGE_CODE public 'sr'
SERBIAN_CYRILLIC_LANGUAGE_CODE public 'sr__cyr'
SERBIAN_LATIN_LANGUAGE_CODE public 'sr__lat'
AZERBAIJANI_LANGUAGE_CODE public 'az'
SLOVAK_LANGUAGE_CODE public 'sk'
FRENCH_LANGUAGE_CODE public 'fr'
FRENCH_AUSTRIAN_LANGUAGE_CODE public 'fr_at'
FRENCH_SWITZERLAND_LANGUAGE_CODE public 'fr_ch'
GERMAN_LANGUAGE_CODE public 'de'
GERMAN_AUSTRIAN_LANGUAGE_CODE public 'de_at'
GERMAN_SWITZERLAND_LANGUAGE_CODE public 'de_ch'
ENGLISH_LANGUAGE_CODE public 'en'
EXTRA_LATIN_CHARS_LANGUAGE_CODE public 'latin'
EXTRA_WHITESPACE_CHARS_LANGUAGE_CODE public ' '
EXTRA_MSWORD_CHARS_LANGUAGE_CODE public 'msword'

Properties

ASCII_MAPS

private static array<string,array<string,string>>|null $ASCII_MAPS
  • This property is static.

ASCII_MAPS_AND_EXTRAS

private static array<string,array<string,string>>|null $ASCII_MAPS_AND_EXTRAS
  • This property is static.

ASCII_EXTRAS

private static array<string,array<string,string>>|null $ASCII_EXTRAS
  • This property is static.

ORD

private static array<string,int>|null $ORD
  • This property is static.

LANGUAGE_MAX_KEY

private static array<string,int>|null $LANGUAGE_MAX_KEY
  • This property is static.

REGEX_ASCII

url: https://en.wikipedia.org/wiki/Wikipedia:ASCII#ASCII_printable_characters

private static string $REGEX_ASCII
  • This property is static.

BIDI_UNI_CODE_CONTROLS_TABLE

bidirectional text chars

private static array<int,string> $BIDI_UNI_CODE_CONTROLS_TABLE

url: https://www.w3.org/International/questions/qa-bidi-unicode-controls

  • This property is static.

Methods

getAllLanguages

Get all languages from the constants "ASCII::.*LANGUAGE_CODE".

public static getAllLanguages(): string[]
  • This method is static.

charsArray

Returns an replacement array for ASCII methods.

public static charsArray(bool $replace_extra_symbols = false): array

EXAMPLE: $array = ASCII::charsArray(); var_dump($array['ru']['б']); // 'b'

  • This method is static.

Parameters:

Parameter Type Description
$replace_extra_symbols bool [optional] <p>Add some more replacements e.g. "£" with " pound ".</p>

charsArrayWithMultiLanguageValues

Returns an replacement array for ASCII methods with a mix of multiple languages.

public static charsArrayWithMultiLanguageValues(bool $replace_extra_symbols = false): array

EXAMPLE: $array = ASCII::charsArrayWithMultiLanguageValues(); var_dump($array['b']); // ['β', 'б', 'ဗ', 'ბ', 'ب']

  • This method is static.

Parameters:

Parameter Type Description
$replace_extra_symbols bool [optional] <p>Add some more replacements e.g. "£" with " pound ".</p>

Return Value:

An array of replacements.


charsArrayWithOneLanguage

Returns an replacement array for ASCII methods with one language.

public static charsArrayWithOneLanguage(string $language = self::ENGLISH_LANGUAGE_CODE, bool $replace_extra_symbols = false, bool $asOrigReplaceArray = true): array

For example, German will map 'ä' to 'ae', while other languages will simply return e.g. 'a'.

EXAMPLE: $array = ASCII::charsArrayWithOneLanguage('ru'); $tmpKey = \array_search('yo', $array['replace']); echo $array['orig'][$tmpKey]; // 'ё'

  • This method is static.

Parameters:

Parameter Type Description
$language string [optional] <p>Language of the source string e.g.: en, de_at, or de-ch.
(default is 'en') &#124; ASCII::*_LANGUAGE_CODE</p>
$replace_extra_symbols bool [optional] <p>Add some more replacements e.g. "£" with " pound ".</p>
$asOrigReplaceArray bool [optional] <p>TRUE === return {orig: string[], replace: string[]}
array</p>

Return Value:

An array of replacements.


charsArrayWithSingleLanguageValues

Returns an replacement array for ASCII methods with multiple languages.

public static charsArrayWithSingleLanguageValues(bool $replace_extra_symbols = false, bool $asOrigReplaceArray = true): array

EXAMPLE: $array = ASCII::charsArrayWithSingleLanguageValues(); $tmpKey = \array_search('hnaik', $array['replace']); echo $array['orig'][$tmpKey]; // '၌'

  • This method is static.

Parameters:

Parameter Type Description
$replace_extra_symbols bool [optional] <p>Add some more replacements e.g. "£" with " pound ".</p>
$asOrigReplaceArray bool [optional] <p>TRUE === return {orig: string[], replace: string[]}
array</p>

Return Value:

An array of replacements.


clean

Accepts a string and removes all non-UTF-8 characters from it + extras if needed.

public static clean(string $str, bool $normalize_whitespace = true, bool $keep_non_breaking_space = false, bool $normalize_msword = true, bool $remove_invisible_characters = true): string
  • This method is static.

Parameters:

Parameter Type Description
$str string <p>The string to be sanitized.</p>
$normalize_whitespace bool [optional] <p>Set to true, if you need to normalize the
whitespace.</p>
$keep_non_breaking_space bool [optional] <p>Set to true, to keep non-breaking-spaces, in
combination with
$normalize_whitespace</p>
$normalize_msword bool [optional] <p>Set to true, if you need to normalize MS Word chars
e.g.: "…"
=> "..."</p>
$remove_invisible_characters bool [optional] <p>Set to false, if you not want to remove invisible
characters e.g.: "\0"</p>

Return Value:

A clean UTF-8 string.


is_ascii

Checks if a string is 7 bit ASCII.

public static is_ascii(string $str): bool

EXAMPLE: ASCII::is_ascii('白'); // false

  • This method is static.

Parameters:

Parameter Type Description
$str string <p>The string to check.</p>

Return Value:

true if it is ASCII
false otherwise


normalize_msword

Returns a string with smart quotes, ellipsis characters, and dashes from Windows-1252 (commonly used in Word documents) replaced by their ASCII equivalents.

public static normalize_msword(string $str): string

EXAMPLE: ASCII::normalize_msword('„Abcdef…”'); // '"Abcdef..."'

  • This method is static.

Parameters:

Parameter Type Description
$str string <p>The string to be normalized.</p>

Return Value:

A string with normalized characters for commonly used chars in Word documents.


normalize_whitespace

Normalize the whitespace.

public static normalize_whitespace(string $str, bool $keepNonBreakingSpace = false, bool $keepBidiUnicodeControls = false, bool $normalize_control_characters = false): string

EXAMPLE: ASCII::normalize_whitespace("abc-\xc2\xa0-öäü-\xe2\x80\xaf-\xE2\x80\xAC", true); // "abc-\xc2\xa0-öäü- -"

  • This method is static.

Parameters:

Parameter Type Description
$str string <p>The string to be normalized.</p>
$keepNonBreakingSpace bool [optional] <p>Set to true, to keep non-breaking-spaces.</p>
$keepBidiUnicodeControls bool [optional] <p>Set to true, to keep non-printable (for the web)
bidirectional text chars.</p>
$normalize_control_characters bool [optional] <p>Set to true, to convert e.g. LINE-, PARAGRAPH-SEPARATOR with "\n" and LINE TABULATION with "\t".</p>

Return Value:

A string with normalized whitespace.


remove_invisible_characters

Remove invisible characters from a string.

public static remove_invisible_characters(string $str, bool $url_encoded = false, string $replacement = &#039;&#039;, bool $keep_basic_control_characters = true): string

e.g.: This prevents sandwiching null characters between ascii characters, like Java\0script.

copy&past from https://github.com/bcit-ci/CodeIgniter/blob/develop/system/core/Common.php

  • This method is static.

Parameters:

Parameter Type Description
$str string
$url_encoded bool
$replacement string
$keep_basic_control_characters bool

to_ascii_remap

WARNING: This method will return broken characters and is only for special cases.

public static to_ascii_remap(string $str1, string $str2): string[]

Convert two UTF-8 encoded string to a single-byte strings suitable for functions that need the same string length after the conversion.

The function simply uses (and updates) a tailored dynamic encoding (in/out map parameter) where non-ascii characters are remapped to the range [128-255] in order of appearance.

  • This method is static.

Parameters:

Parameter Type Description
$str1 string
$str2 string

to_ascii

Returns an ASCII version of the string. A set of non-ASCII characters are replaced with their closest ASCII counterparts, and the rest are removed by default. The language or locale of the source string can be supplied for language-specific transliteration in any of the following formats: en, en_GB, or en-GB. For example, passing "de" results in "äöü" mapping to "aeoeue" rather than "aou" as in other languages.

public static to_ascii(string $str, string $language = self::ENGLISH_LANGUAGE_CODE, bool $remove_unsupported_chars = true, bool $replace_extra_symbols = false, bool $use_transliterate = false, bool|null $replace_single_chars_only = null): string

EXAMPLE: ASCII::to_ascii('�Düsseldorf�', 'en'); // Dusseldorf

  • This method is static.

Parameters:

Parameter Type Description
$str string <p>The input string.</p>
$language string [optional] <p>Language of the source string.
(default is 'en') &#124; ASCII::*_LANGUAGE_CODE</p>
$remove_unsupported_chars bool [optional] <p>Whether or not to remove the
unsupported characters.</p>
$replace_extra_symbols bool [optional] <p>Add some more replacements e.g. "£" with " pound
".</p>
$use_transliterate bool [optional] <p>Use ASCII::to_transliterate() for unknown chars.</p>
$replace_single_chars_only bool|null [optional] <p>Single char replacement is better for the
performance, but some languages need to replace more then one char
at the same time. &#124; NULL === auto-setting, depended on the
language</p>

Return Value:

A string that contains only ASCII characters.


to_filename

Convert given string to safe filename (and keep string case).

public static to_filename(string $str, bool $use_transliterate = true, string $fallback_char = &#039;-&#039;): string

EXAMPLE: ASCII::to_filename('שדגשדג.png', true)); // 'shdgshdg.png'

  • This method is static.

Parameters:

Parameter Type Description
$str string
$use_transliterate bool <p>ASCII::to_transliterate() is used by default - unsafe characters are
simply replaced with hyphen otherwise.</p>
$fallback_char string

Return Value:

A string that contains only safe characters for a filename.


to_slugify

Converts the string into an URL slug. This includes replacing non-ASCII characters with their closest ASCII equivalents, removing remaining non-ASCII and non-alphanumeric characters, and replacing whitespace with $separator. The separator defaults to a single dash, and the string is also converted to lowercase. The language of the source string can also be supplied for language-specific transliteration.

public static to_slugify(string $str, string $separator = &#039;-&#039;, string $language = self::ENGLISH_LANGUAGE_CODE, array&lt;string,string&gt; $replacements = [], bool $replace_extra_symbols = false, bool $use_str_to_lower = true, bool $use_transliterate = false): string
  • This method is static.

Parameters:

Parameter Type Description
$str string
$separator string [optional] <p>The string used to replace whitespace.</p>
$language string [optional] <p>Language of the source string.
(default is 'en') &#124; ASCII::*_LANGUAGE_CODE</p>
$replacements array [optional] <p>A map of replaceable strings.</p>
$replace_extra_symbols bool [optional] <p>Add some more replacements e.g. "£" with "
pound ".</p>
$use_str_to_lower bool [optional] <p>Use "string to lower" for the input.</p>
$use_transliterate bool [optional] <p>Use ASCII::to_transliterate() for unknown
chars.</p>

Return Value:

A string that has been converted to an URL slug.


to_transliterate

Returns an ASCII version of the string. A set of non-ASCII characters are replaced with their closest ASCII counterparts, and the rest are removed unless instructed otherwise.

public static to_transliterate(string $str, string|null $unknown = &#039;?&#039;, bool $strict = false): string

EXAMPLE: ASCII::to_transliterate('déjà σσς iıii'); // 'deja sss iiii'

  • This method is static.

Parameters:

Parameter Type Description
$str string <p>The input string.</p>
$unknown string|null [optional] <p>Character use if character unknown. (default is '?')
But you can also use NULL to keep the unknown chars.</p>
$strict bool [optional] <p>Use "transliterator_transliterate()" from PHP-Intl

Return Value:

A String that contains only ASCII characters.


to_ascii_remap_intern

WARNING: This method will return broken characters and is only for special cases.

private static to_ascii_remap_intern(string $str, array& $map): string

Convert a UTF-8 encoded string to a single-byte string suitable for functions that need the same string length after the conversion.

The function simply uses (and updates) a tailored dynamic encoding (in/out map parameter) where non-ascii characters are remapped to the range [128-255] in order of appearance.

Thus, it supports up to 128 different multibyte code points max over the whole set of strings sharing this encoding.

Source: https://github.com/KEINOS/mb_levenshtein

  • This method is static.

Parameters:

Parameter Type Description
$str string <p>UTF-8 string to be converted to extended ASCII.</p>
$map array <p>Internal-Map of code points to ASCII characters.</p>

Return Value:

Mapped borken string.


get_language

Get the language from a string.

private static get_language(string $language): string

e.g.: de_at -> de_at de_DE -> de DE_DE -> de de-de -> de

  • This method is static.

Parameters:

Parameter Type Description
$language string

getData

Get data from "/data/*.php".

private static getData(string $file): array
  • This method is static.

Parameters:

Parameter Type Description
$file string

getDataIfExists

Get data from "/data/*.php".

private static getDataIfExists(string $file): array
  • This method is static.

Parameters:

Parameter Type Description
$file string

prepareAsciiAndExtrasMaps

private static prepareAsciiAndExtrasMaps(): void
  • This method is static.

prepareAsciiMaps

private static prepareAsciiMaps(): void
  • This method is static.

prepareAsciiExtras

private static prepareAsciiExtras(): void
  • This method is static.


Automatically generated on 2025-03-18