UnicodeWarning

UnicodeWarning

UnicodeWarning: Unicode comparison issues

Traceback

terminal

Traceback (most recent call last):
  File "main.py", line 4, in <module>
    print(a == b)  # may be False
UnicodeWarning: Unicode comparison issues

What causes this error

A Unicode operation encountered an ambiguous or potentially incorrect condition, typically related to normalization differences in string comparison.

How to fix it

Normalize Unicode strings before comparison using `unicodedata.normalize('NFC', text)`. Use case-folding for case-insensitive comparisons. Be aware of homoglyphs and normalization forms.

Code that causes this error

Broken

# These look the same but may compare differently
a = "café"     # é as single char
b = "cafe\u0301"  # e + combining accent
print(a == b)  # may be False

Fixed code

Fixed

import unicodedata
a = unicodedata.normalize("NFC", "café")
b = unicodedata.normalize("NFC", "cafe\u0301")
print(a == b)  # True

About UnicodeWarning

A UnicodeWarning is issued for Unicode-related conditions that are suspicious but not errors. This is a very rare warning in practice. It can appear when comparing Unicode strings that contain characters with ambiguous normalization — for example, the character 'é' can be represented as a single code point (U+00E9) or as 'e' + combining acute accent (U+0065 U+0301).

These two forms look identical but compare as unequal unless normalized. The `unicodedata.normalize()` function converts between normalization forms (NFC, NFD, NFKC, NFKD). This warning is most relevant when processing text from diverse sources (web scraping, user input, file imports) where normalization differences can cause subtle bugs in searching, sorting, and deduplication.

Common scenarios

Using deprecated APIs that will be removed in future versions

Performing numerical operations that produce inf or NaN values

Leaving files, sockets, or connections open without proper cleanup

Calling async functions without awaiting the result