Question 1

What is a Unicode code point?

Accepted Answer

A Unicode code point is a unique number assigned to each character in the Unicode standard. It's typically written as U+ followed by 4–6 hexadecimal digits (e.g., U+0041 for 'A', U+1F600 for 😀). Code points range from U+0000 to U+10FFFF, covering over 1.1 million possible characters.

Question 2

What is the difference between UTF-8, UTF-16, and UTF-32?

Accepted Answer

UTF-8 uses 1–4 bytes per character and is ASCII-compatible for English. UTF-16 uses 2 or 4 bytes (surrogate pairs for characters above U+FFFF). UTF-32 uses exactly 4 bytes per character. UTF-8 is dominant on the web; UTF-16 is common in Windows and JavaScript; UTF-32 is used when fixed-width encoding is needed.

Question 3

How does UTF-8 encode characters?

Accepted Answer

UTF-8 is variable-length: ASCII (0–127) uses 1 byte, Latin-1 extensions use 2 bytes, most other scripts (CJK, Arabic, Cyrillic) use 3 bytes, and emoji/supplementary characters use 4 bytes. The encoding is self-synchronizing — you can always find character boundaries by reading byte patterns.

Question 4

Can I convert Unicode code points back to text?

Accepted Answer

Yes. Enter code points in U+XXXX format (space-separated or comma-separated) in the code point input field. The tool will decode them to the corresponding characters. Supports both BMP characters (U+0000–U+FFFF) and supplementary characters like emoji (U+10000 and above).

Question 5

Why do emoji use more bytes in UTF-8?

Accepted Answer

Emoji and other characters above U+FFFF (the Basic Multilingual Plane) require 4 bytes in UTF-8. For example, 😀 (U+1F600) encodes as F0 9F 98 80. These characters also use surrogate pairs in UTF-16 (two 16-bit code units).

Unicode / UTF-8 Converter

Understanding Unicode and Character Encoding

UTF-8: The Dominant Web Encoding

UTF-16 and UTF-32 Encoding

Character Names and Properties

Frequently Asked Questions

What is a Unicode code point?

What is the difference between UTF-8, UTF-16, and UTF-32?

How does UTF-8 encode characters?

Can I convert Unicode code points back to text?

Why do emoji use more bytes in UTF-8?

Related Tools

More From Encoding Tools

Recently Used Developer Tools

Explore More Tools