Convert any text to Unicode escape sequences (\uXXXX) and back. Essential for JavaScript strings, JSON encoding, debugging internationalized text, or embedding non‑ASCII characters in ASCII‑safe environments.
Unicode escape is the process of representing a Unicode character using the sequence \uXXXX where XXXX is the character’s code point in hexadecimal (4 digits for the Basic Multilingual Plane). For characters outside BMP (code points > U+FFFF), some conventions use \u{XXXXX} or a pair of surrogates (UTF‑16). This tool uses the JavaScript/JSON standard: escape outputs \uXXXX for BMP characters; for astral symbols (emoji, rare characters), it outputs the UTF‑16 surrogate pair as two \uXXXX sequences. This matches how JSON.stringify() and JavaScript string escapes work.
| Character | Code point (hex) | Escaped form | Notes |
|---|---|---|---|
| A | U+0041 |
\u0041
|
Latin capital A |
| é | U+00E9 |
\u00E9
|
Latin small e with acute |
| 世 | U+4E16 |
\u4E16
|
CJK ideograph |
| ? (globe) | U+1F30D |
\uD83C\uDF0D
|
Surrogate pair (UTF‑16) |
A backend API returns user‑submitted content in JSON. Some clients (e.g., legacy embedded systems) cannot handle raw UTF‑8. By applying Unicode escaping, the response becomes ASCII‑safe. For instance, "name": "José" becomes "name": "Jos\u00E9". The client unescapes safely. This tool helps test both directions.
Escape: Iterate over each character in the input string. For each character, get its Unicode code point (using codePointAt()). If the code point is less than 0x80 (ASCII) and is not a control character that typically requires escaping? This tool escapes every character whose code point > 0x7F (non‑ASCII), plus a few ASCII special characters (like backslash itself). For characters in the Basic Multilingual Plane (code point ≤ 0xFFFF), output \uXXXX with 4‑digit hex. For code points > 0xFFFF (astral planes), split into surrogate pair (high and low surrogates) and output \uD800\uDC00 style.
Unescape: Parse the input string for patterns \uXXXX (4 hex digits). For each match, convert the hex digits to a character. If a surrogate pair is found (\uD800 to \uDFFF followed by another valid surrogate), combine them into a single astral character using String.fromCodePoint().
This implementation mirrors the behavior of JavaScript’s native unescape() for Unicode escapes but with full surrogate‑awareness. All operations are local and efficient for strings up to hundreds of kilobytes.
Performance note: For a typical 10,000‑character string with mixed scripts, conversion takes under 50 ms on modern devices. Very large texts (1 MB+) may take 1‑2 seconds.
| Scenario | Action | Reason |
|---|---|---|
| Generating JavaScript string literals from user input | Escape | Prevent syntax errors and ensure safe embedding. |
| Reading escaped data from JSON files | Unescape | Restore human‑readable text. |
| Storing text in ASCII‑only databases | Escape | Preserve Unicode information without binary blobs. |
| Debugging Unicode glyphs in logs | Unescape | See actual characters instead of codes. |
Example transformation:
Input: Hello 世界 ?
Escaped: Hello \u4E16\u754C \uD83C\uDF0D
Unescaped back: Hello 世界 ?
\uXXXX sequences (e.g., ? → \uD83C\uDF0D). The unescape function recognizes consecutive surrogates and recombines them into a single character. This matches the standard JavaScript escape representation.
\uXXXX surrogate pair format, which is compatible with JSON and older JavaScript versions. The newer \u{codePoint} syntax is not supported for escape, but the unescape function will not misinterpret it — it will be left as literal text. For full ES6 support, you may need a different tool.