API Documentation¶
confusable_homoglyphs package¶
Submodules¶
confusable_homoglyphs.categories module¶
-
confusable_homoglyphs.categories.
alias
(chr)[source]¶ Retrieves the script block alias for a unicode character.
>>> categories.alias('A') 'LATIN' >>> categories.alias('τ') 'GREEK' >>> categories.alias('-') 'COMMON'
Parameters: chr (str) – A unicode character Returns: The script block alias. Return type: str
-
confusable_homoglyphs.categories.
aliases_categories
(chr)[source]¶ Retrieves the script block alias and unicode category for a unicode character.
>>> categories.aliases_categories('A') ('LATIN', 'L') >>> categories.aliases_categories('τ') ('GREEK', 'L') >>> categories.aliases_categories('-') ('COMMON', 'Pd')
Parameters: chr (str) – A unicode character Returns: The script block alias and unicode category for a unicode character. Return type: (str, str)
-
confusable_homoglyphs.categories.
category
(chr)[source]¶ Retrieves the unicode category for a unicode character.
>>> categories.category('A') 'L' >>> categories.category('τ') 'L' >>> categories.category('-') 'Pd'
Parameters: chr (str) – A unicode character Returns: The unicode category for a unicode character. Return type: str
-
confusable_homoglyphs.categories.
unique_aliases
(string)[source]¶ Retrieves all unique script block aliases used in a unicode string.
>>> categories.unique_aliases('ABC') {'LATIN'} >>> categories.unique_aliases('ρAτ-') {'GREEK', 'LATIN', 'COMMON'}
Parameters: string (str) – A unicode character Returns: A set of the script block aliases used in a unicode string. Return type: (str, str)
confusable_homoglyphs.cli module¶
confusable_homoglyphs.confusables module¶
-
confusable_homoglyphs.confusables.
is_confusable
(string, greedy=False, preferred_aliases=[])[source]¶ Checks if
string
contains characters which might be confusable with characters frompreferred_aliases
.If
greedy=False
, it will only return the first confusable character found without looking at the rest of the string,greedy=True
returns all of them.preferred_aliases=[]
can take an array of unicode block aliases to be considered as your ‘base’ unicode blocks:- considering
paρa
,- with
preferred_aliases=['latin']
, the 3rd characterρ
would be returned because this greek letter can be confused with latinp
. - with
preferred_aliases=['greek']
, the 1st characterp
would be returned because this latin letter can be confused with greekρ
. - with
preferred_aliases=[]
andgreedy=True
, you’ll discover the 29 characters that can be confused withp
, the 23 characters that look likea
, and the one that looks likeρ
(which is, of course, p aka LATIN SMALL LETTER P).
- with
>>> confusables.is_confusable('paρa', preferred_aliases=['latin'])[0]['character'] 'ρ' >>> confusables.is_confusable('paρa', preferred_aliases=['greek'])[0]['character'] 'p' >>> confusables.is_confusable('Abç', preferred_aliases=['latin']) False >>> confusables.is_confusable('AlloΓ', preferred_aliases=['latin']) False >>> confusables.is_confusable('ρττ', preferred_aliases=['greek']) False >>> confusables.is_confusable('ρτ.τ', preferred_aliases=['greek', 'common']) False >>> confusables.is_confusable('ρττp') [{'homoglyphs': [{'c': 'p', 'n': 'LATIN SMALL LETTER P'}], 'alias': 'GREEK', 'character': 'ρ'}]
Parameters: - string (str) – A unicode string
- greedy (bool) – Don’t stop on finding one confusable character - find all of them.
- preferred_aliases (list(str)) – Script blocks aliases which we don’t want
string
‘s characters to be confused with.
Returns: False if not confusable, all confusable characters and with what they are confusable otherwise.
Return type: bool or list
- considering
-
confusable_homoglyphs.confusables.
is_dangerous
(string, preferred_aliases=[])[source]¶ Checks if
string
can be dangerous, i.e. is it not only mixed-scripts but also contains characters from other scripts than the ones inpreferred_aliases
that might be confusable with characters from scripts inpreferred_aliases
For
preferred_aliases
examples, seeis_confusable
docstring.>>> bool(confusables.is_dangerous('Allo')) False >>> bool(confusables.is_dangerous('AlloΓ', preferred_aliases=['latin'])) False >>> bool(confusables.is_dangerous('Alloρ')) True >>> bool(confusables.is_dangerous('AlaskaJazz')) False >>> bool(confusables.is_dangerous('ΑlaskaJazz')) True
Parameters: - string (str) – A unicode string
- preferred_aliases (list(str)) – Script blocks aliases which we don’t want
string
‘s characters to be confused with.
Returns: Is it dangerous.
Return type: bool
-
confusable_homoglyphs.confusables.
is_mixed_script
(string, allowed_aliases=['COMMON'])[source]¶ Checks if
string
contains mixed-scripts content, excluding script blocks aliases inallowed_aliases
.E.g.
B. C
is not considered mixed-scripts by default: it contains characters from Latin and Common, but Common is excluded by default.>>> confusables.is_mixed_script('Abç') False >>> confusables.is_mixed_script('ρτ.τ') False >>> confusables.is_mixed_script('ρτ.τ', allowed_aliases=[]) True >>> confusables.is_mixed_script('Alloτ') True
Parameters: - string (str) – A unicode string
- allowed_aliases (list(str)) – Script blocks aliases not to consider.
Returns: Whether
string
is considered mixed-scripts or not.Return type: bool
confusable_homoglyphs.utils module¶
-
confusable_homoglyphs.utils.
load
(filename)[source]¶ Loads a JSON data file.
Returns: A dict. Return type: dict