UnicodeDecodeError
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff
Traceback
Traceback (most recent call last):
File "main.py", line 2, in <module>
text = f.read() # may fail if binary data
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xffWhat causes this error
Bytes were decoded with a codec that does not match their actual encoding. A byte sequence was encountered that is invalid for the specified encoding.
How to fix it
Identify the correct encoding of the data. Use `open(file, encoding='latin-1')` or the appropriate codec. Use the `chardet` library to detect unknown encodings. Pass `errors='replace'` to handle malformed data.
Code that causes this error
with open("data.bin", "r") as f:
text = f.read() # may fail if binary dataFixed code
with open("data.bin", "r", encoding="utf-8", errors="replace") as f:
text = f.read()About UnicodeDecodeError
A UnicodeDecodeError is raised when converting bytes to a string and the byte sequence is not valid in the specified encoding. This is one of the most common encoding errors, especially when reading files or network data. The error means that the bytes you have do not represent valid characters in the codec you chose.
For example, reading a Latin-1 encoded file as UTF-8, or processing binary data (images, PDFs) as if it were text. The error message tells you exactly which byte at which position caused the failure and which codec was used. To fix this, you need to identify the actual encoding of the data — tools like `chardet` or `charset_normalizer` can automatically detect encoding.
If you cannot determine the encoding, using `errors='replace'` substitutes invalid bytes with the Unicode replacement character (U+FFFD), and `errors='ignore'` silently skips them.
Common scenarios
Converting user input strings to numbers without validation
Unpacking iterables with an unexpected number of elements
Passing out-of-range values to mathematical functions
Processing data files with inconsistent or malformed records