API Documentation

confusable_homoglyphs package

Submodules

confusable_homoglyphs.categories module

confusable_homoglyphs.categories.alias(chr)[source]

Retrieves the script block alias for a unicode character.

>>> categories.alias('A')
'LATIN'
>>> categories.alias('τ')
'GREEK'
>>> categories.alias('-')
'COMMON'
Parameters:chr (str) – A unicode character
Returns:The script block alias.
Return type:str
confusable_homoglyphs.categories.aliases_categories(chr)[source]

Retrieves the script block alias and unicode category for a unicode character.

>>> categories.aliases_categories('A')
('LATIN', 'L')
>>> categories.aliases_categories('τ')
('GREEK', 'L')
>>> categories.aliases_categories('-')
('COMMON', 'Pd')
Parameters:chr (str) – A unicode character
Returns:The script block alias and unicode category for a unicode character.
Return type:(str, str)
confusable_homoglyphs.categories.category(chr)[source]

Retrieves the unicode category for a unicode character.

>>> categories.category('A')
'L'
>>> categories.category('τ')
'L'
>>> categories.category('-')
'Pd'
Parameters:chr (str) – A unicode character
Returns:The unicode category for a unicode character.
Return type:str
confusable_homoglyphs.categories.generate()[source]

Generates the categories JSON data file from the unicode specification.

Returns:True for success, raises otherwise.
Return type:bool
confusable_homoglyphs.categories.unique_aliases(string)[source]

Retrieves all unique script block aliases used in a unicode string.

>>> categories.unique_aliases('ABC')
{'LATIN'}
>>> categories.unique_aliases('ρAτ-')
{'GREEK', 'LATIN', 'COMMON'}
Parameters:string (str) – A unicode character
Returns:A set of the script block aliases used in a unicode string.
Return type:(str, str)

confusable_homoglyphs.confusables module

exception confusable_homoglyphs.confusables.Found[source]

Bases: exceptions.Exception

confusable_homoglyphs.confusables.generate()[source]

Generates the confusables JSON data file from the unicode specification.

Returns:True for success, raises otherwise.
Return type:bool
confusable_homoglyphs.confusables.is_confusable(string, greedy=False, preferred_aliases=[])[source]

Checks if string contains characters which might be confusable with characters from preferred_aliases.

If greedy=False, it will only return the first confusable character found without looking at the rest of the string, greedy=True returns all of them.

preferred_aliases=[] can take an array of unicode block aliases to be considered as your ‘base’ unicode blocks:

  • considering paρa,
    • with preferred_aliases=['latin'], the 3rd character ρ would be returned because this greek letter can be confused with latin p.
    • with preferred_aliases=['greek'], the 1st character p would be returned because this latin letter can be confused with greek ρ.
    • with preferred_aliases=[] and greedy=True, you’ll discover the 29 characters that can be confused with p, the 23 characters that look like a, and the one that looks like ρ (which is, of course, p aka LATIN SMALL LETTER P).
>>> confusables.is_confusable('paρa', preferred_aliases=['latin'])[0]['character']
'ρ'
>>> confusables.is_confusable('paρa', preferred_aliases=['greek'])[0]['character']
'p'
>>> confusables.is_confusable('Abç', preferred_aliases=['latin'])
False
>>> confusables.is_confusable('AlloΓ', preferred_aliases=['latin'])
False
>>> confusables.is_confusable('ρττ', preferred_aliases=['greek'])
False
>>> confusables.is_confusable('ρτ.τ', preferred_aliases=['greek', 'common'])
False
>>> confusables.is_confusable('ρττp')
[{'homoglyphs': [{'c': 'p', 'n': 'LATIN SMALL LETTER P'}], 'alias': 'GREEK', 'character': 'ρ'}]
Parameters:
  • string (str) – A unicode string
  • greedy (bool) – Don’t stop on finding one confusable character - find all of them.
  • preferred_aliases (list(str)) – Script blocks aliases which we don’t want string‘s characters to be confused with.
Returns:

False if not confusable, all confusable characters and with what they are confusable otherwise.

Return type:

bool or list

confusable_homoglyphs.confusables.is_dangerous(string, preferred_aliases=[])[source]

Checks if string can be dangerous, i.e. is it not only mixed-scripts but also contains characters from other scripts than the ones in preferred_aliases that might be confusable with characters from scripts in preferred_aliases

For preferred_aliases examples, see is_confusable docstring.

>>> bool(confusables.is_dangerous('Allo'))
False
>>> bool(confusables.is_dangerous('AlloΓ', preferred_aliases=['latin']))
False
>>> bool(confusables.is_dangerous('Alloρ'))
True
>>> bool(confusables.is_dangerous('AlaskaJazz'))
False
>>> bool(confusables.is_dangerous('ΑlaskaJazz'))
True
Parameters:
  • string (str) – A unicode string
  • preferred_aliases (list(str)) – Script blocks aliases which we don’t want string‘s characters to be confused with.
Returns:

Is it dangerous.

Return type:

bool

confusable_homoglyphs.confusables.is_mixed_script(string, allowed_aliases=['COMMON'])[source]

Checks if string contains mixed-scripts content, excluding script blocks aliases in allowed_aliases.

E.g. B. C is not considered mixed-scripts by default: it contains characters from Latin and Common, but Common is excluded by default.

>>> confusables.is_mixed_script('Abç')
False
>>> confusables.is_mixed_script('ρτ.τ')
False
>>> confusables.is_mixed_script('ρτ.τ', allowed_aliases=[])
True
>>> confusables.is_mixed_script('Alloτ')
True
Parameters:
  • string (str) – A unicode string
  • allowed_aliases (list(str)) – Script blocks aliases not to consider.
Returns:

Whether string is considered mixed-scripts or not.

Return type:

bool

confusable_homoglyphs.utils module

confusable_homoglyphs.utils.delete(filename)[source]

Deletes a JSON data file if it exists.

confusable_homoglyphs.utils.get(url)[source]
confusable_homoglyphs.utils.load(filename)[source]

Loads a JSON data file.

Returns:A dict.
Return type:dict
confusable_homoglyphs.utils.u(x)[source]

Module contents