Convert text to UTF-8 byte sequences and decode UTF-8 bytes back to text. Essential for internationalization.
UTF-8 (8-bit Unicode Transformation Format) is a variable-width character encoding capable of encoding all 1,112,064 valid Unicode code points using one to four one-byte (8-bit) code units.
| Code Point Range | Bytes | Byte 1 | Byte 2 | Byte 3 | Byte 4 | Example |
|---|---|---|---|---|---|---|
| U+0000 to U+007F | 1 | 0xxxxxxx | A (U+0041) → 41 | |||
| U+0080 to U+07FF | 2 | 110xxxxx | 10xxxxxx | é (U+00E9) → C3 A9 | ||
| U+0800 to U+FFFF | 3 | 1110xxxx | 10xxxxxx | 10xxxxxx | 中 (U+4E2D) → E4 B8 AD | |
| U+10000 to U+10FFFF | 4 | 11110xxx | 10xxxxxx | 10xxxxxx | 10xxxxxx | ? (U+1F60A) → F0 9F 98 8A |
Note: UTF-8 is the dominant character encoding for the World Wide Web, accounting for 98% of all web pages as of 2024. Its design allows for backward compatibility with ASCII and avoids byte-order issues.
UTF-8 is a variable-length character encoding that represents each Unicode character with one to four bytes. Developed in 1992 by Ken Thompson and Rob Pike, it has become the standard encoding for the web and modern software due to its efficiency and compatibility.
Technical Note: UTF-8 preserves all ASCII characters in their single-byte form, making it backward compatible with ASCII. Non-ASCII characters are represented using multi-byte sequences.
Choose between encoding (text to UTF-8 bytes) or decoding (UTF-8 bytes to text) using the tabs.
For encoding: Enter your text and adjust formatting options.
For decoding: Enter UTF-8 byte sequence (hexadecimal) and adjust decoding options.
Click the convert button or let the real-time conversion do the work.
Copy your result using the copy button.
Technical Example: The character "中" (Chinese for "middle") has the Unicode code point U+4E2D. In UTF-8, it is encoded as three bytes: E4 B8 AD.