API Documentation¶
confusable_homoglyphs package¶
Submodules¶
confusable_homoglyphs.categories module¶
- confusable_homoglyphs.categories.alias(chr)[source]¶
Retrieves the script block alias for a unicode character.
>>> categories.alias('A') 'LATIN' >>> categories.alias('τ') 'GREEK' >>> categories.alias('-') 'COMMON'
- Parameters:
chr (str) – A unicode character
- Returns:
The script block alias.
- Return type:
str
- confusable_homoglyphs.categories.aliases_categories(chr)[source]¶
Retrieves the script block alias and unicode category for a unicode character.
>>> categories.aliases_categories('A') ('LATIN', 'L') >>> categories.aliases_categories('τ') ('GREEK', 'L') >>> categories.aliases_categories('-') ('COMMON', 'Pd')
- Parameters:
chr (str) – A unicode character
- Returns:
The script block alias and unicode category for a unicode character.
- Return type:
(str, str)
- confusable_homoglyphs.categories.category(chr)[source]¶
Retrieves the unicode category for a unicode character.
>>> categories.category('A') 'L' >>> categories.category('τ') 'L' >>> categories.category('-') 'Pd'
- Parameters:
chr (str) – A unicode character
- Returns:
The unicode category for a unicode character.
- Return type:
str
- confusable_homoglyphs.categories.unique_aliases(string)[source]¶
Retrieves all unique script block aliases used in a unicode string.
>>> categories.unique_aliases('ABC') {'LATIN'} >>> categories.unique_aliases('ρAτ-') {'GREEK', 'LATIN', 'COMMON'}
- Parameters:
string (str) – A unicode character
- Returns:
A set of the script block aliases used in a unicode string.
- Return type:
(str, str)
confusable_homoglyphs.cli module¶
confusable_homoglyphs.confusables module¶
- confusable_homoglyphs.confusables.is_confusable(string, greedy=False, preferred_aliases=[])[source]¶
Checks if
string
contains characters which might be confusable with characters frompreferred_aliases
.If
greedy=False
, it will only return the first confusable character found without looking at the rest of the string,greedy=True
returns all of them.preferred_aliases=[]
can take an array of unicode block aliases to be considered as your ‘base’ unicode blocks:considering
paρa
,with
preferred_aliases=['latin']
, the 3rd characterρ
would be returned because this greek letter can be confused with latinp
.with
preferred_aliases=['greek']
, the 1st characterp
would be returned because this latin letter can be confused with greekρ
.with
preferred_aliases=[]
andgreedy=True
, you’ll discover the 29 characters that can be confused withp
, the 23 characters that look likea
, and the one that looks likeρ
(which is, of course, p aka LATIN SMALL LETTER P).
>>> confusables.is_confusable('paρa', preferred_aliases=['latin'])[0]['character'] 'ρ' >>> confusables.is_confusable('paρa', preferred_aliases=['greek'])[0]['character'] 'p' >>> confusables.is_confusable('Abç', preferred_aliases=['latin']) False >>> confusables.is_confusable('AlloΓ', preferred_aliases=['latin']) False >>> confusables.is_confusable('ρττ', preferred_aliases=['greek']) False >>> confusables.is_confusable('ρτ.τ', preferred_aliases=['greek', 'common']) False >>> confusables.is_confusable('ρττp') [{'homoglyphs': [{'c': 'p', 'n': 'LATIN SMALL LETTER P'}], 'alias': 'GREEK', 'character': 'ρ'}]
- Parameters:
string (str) – A unicode string
greedy (bool) – Don’t stop on finding one confusable character - find all of them.
preferred_aliases (list(str)) – Script blocks aliases which we don’t want
string
’s characters to be confused with.
- Returns:
False if not confusable, all confusable characters and with what they are confusable otherwise.
- Return type:
bool or list
- confusable_homoglyphs.confusables.is_dangerous(string, preferred_aliases=[])[source]¶
Checks if
string
can be dangerous, i.e. is it not only mixed-scripts but also contains characters from other scripts than the ones inpreferred_aliases
that might be confusable with characters from scripts inpreferred_aliases
For
preferred_aliases
examples, seeis_confusable
docstring.>>> bool(confusables.is_dangerous('Allo')) False >>> bool(confusables.is_dangerous('AlloΓ', preferred_aliases=['latin'])) False >>> bool(confusables.is_dangerous('Alloρ')) True >>> bool(confusables.is_dangerous('AlaskaJazz')) False >>> bool(confusables.is_dangerous('ΑlaskaJazz')) True
- Parameters:
string (str) – A unicode string
preferred_aliases (list(str)) – Script blocks aliases which we don’t want
string
’s characters to be confused with.
- Returns:
Is it dangerous.
- Return type:
bool
- confusable_homoglyphs.confusables.is_mixed_script(string, allowed_aliases=['COMMON'])[source]¶
Checks if
string
contains mixed-scripts content, excluding script blocks aliases inallowed_aliases
.E.g.
B. C
is not considered mixed-scripts by default: it contains characters from Latin and Common, but Common is excluded by default.>>> confusables.is_mixed_script('Abç') False >>> confusables.is_mixed_script('ρτ.τ') False >>> confusables.is_mixed_script('ρτ.τ', allowed_aliases=[]) True >>> confusables.is_mixed_script('Alloτ') True
- Parameters:
string (str) – A unicode string
allowed_aliases (list(str)) – Script blocks aliases not to consider.
- Returns:
Whether
string
is considered mixed-scripts or not.- Return type:
bool
confusable_homoglyphs.utils module¶
- confusable_homoglyphs.utils.load(filename)[source]¶
Loads a JSON data file.
- Returns:
A dict.
- Return type:
dict