UnicodeWarning
UnicodeWarning: Unicode comparison issues
Traceback
Traceback (most recent call last):
File "main.py", line 4, in <module>
print(a == b) # may be False
UnicodeWarning: Unicode comparison issuesWhat causes this error
A Unicode operation encountered an ambiguous or potentially incorrect condition, typically related to normalization differences in string comparison.
How to fix it
Normalize Unicode strings before comparison using `unicodedata.normalize('NFC', text)`. Use case-folding for case-insensitive comparisons. Be aware of homoglyphs and normalization forms.
Code that causes this error
# These look the same but may compare differently a = "café" # é as single char b = "cafe\u0301" # e + combining accent print(a == b) # may be False
Fixed code
import unicodedata
a = unicodedata.normalize("NFC", "café")
b = unicodedata.normalize("NFC", "cafe\u0301")
print(a == b) # TrueAbout UnicodeWarning
A UnicodeWarning is issued for Unicode-related conditions that are suspicious but not errors. This is a very rare warning in practice. It can appear when comparing Unicode strings that contain characters with ambiguous normalization — for example, the character 'é' can be represented as a single code point (U+00E9) or as 'e' + combining acute accent (U+0065 U+0301).
These two forms look identical but compare as unequal unless normalized. The `unicodedata.normalize()` function converts between normalization forms (NFC, NFD, NFKC, NFKD). This warning is most relevant when processing text from diverse sources (web scraping, user input, file imports) where normalization differences can cause subtle bugs in searching, sorting, and deduplication.
Common scenarios
Using deprecated APIs that will be removed in future versions
Performing numerical operations that produce inf or NaN values
Leaving files, sockets, or connections open without proper cleanup
Calling async functions without awaiting the result