Unicode Escape / Unescape

Convert any text to Unicode escape sequences (\uXXXX) and back. Essential for JavaScript strings, JSON encoding, debugging internationalized text, or embedding non‑ASCII characters in ASCII‑safe environments.

Examples:
? Basic ASCII + Emoji
? CJK (Chinese/Japanese)
? Already escaped
? Mathematical symbols
Privacy first: All conversions occur locally in your browser. No data is uploaded. The tool works offline after the page loads.

What is Unicode escaping?

Unicode escape is the process of representing a Unicode character using the sequence \uXXXX where XXXX is the character’s code point in hexadecimal (4 digits for the Basic Multilingual Plane). For characters outside BMP (code points > U+FFFF), some conventions use \u{XXXXX} or a pair of surrogates (UTF‑16). This tool uses the JavaScript/JSON standard: escape outputs \uXXXX for BMP characters; for astral symbols (emoji, rare characters), it outputs the UTF‑16 surrogate pair as two \uXXXX sequences. This matches how JSON.stringify() and JavaScript string escapes work.

Escape format examples
Character Code point (hex) Escaped form Notes
A U+0041 \u0041 Latin capital A
é U+00E9 \u00E9 Latin small e with acute
U+4E16 \u4E16 CJK ideograph
? (globe) U+1F30D \uD83C\uDF0D Surrogate pair (UTF‑16)

Why use this tool?

  • JavaScript/JSON strings: Safely embed Unicode characters in ASCII source code or JSON without literal non‑ASCII bytes.
  • Debugging: Inspect invisible or hard‑to‑type Unicode characters by visualizing their escapes.
  • Data portability: Convert text to a purely ASCII representation for storage in legacy systems.
  • Learning Unicode: Understand how characters map to code points and surrogates.
Real‑world use case: JSON API with non‑ASCII

A backend API returns user‑submitted content in JSON. Some clients (e.g., legacy embedded systems) cannot handle raw UTF‑8. By applying Unicode escaping, the response becomes ASCII‑safe. For instance, "name": "José" becomes "name": "Jos\u00E9". The client unescapes safely. This tool helps test both directions.

Algorithm & implementation notes

Escape: Iterate over each character in the input string. For each character, get its Unicode code point (using codePointAt()). If the code point is less than 0x80 (ASCII) and is not a control character that typically requires escaping? This tool escapes every character whose code point > 0x7F (non‑ASCII), plus a few ASCII special characters (like backslash itself). For characters in the Basic Multilingual Plane (code point ≤ 0xFFFF), output \uXXXX with 4‑digit hex. For code points > 0xFFFF (astral planes), split into surrogate pair (high and low surrogates) and output \uD800\uDC00 style.

Unescape: Parse the input string for patterns \uXXXX (4 hex digits). For each match, convert the hex digits to a character. If a surrogate pair is found (\uD800 to \uDFFF followed by another valid surrogate), combine them into a single astral character using String.fromCodePoint().

This implementation mirrors the behavior of JavaScript’s native unescape() for Unicode escapes but with full surrogate‑awareness. All operations are local and efficient for strings up to hundreds of kilobytes.

Performance note: For a typical 10,000‑character string with mixed scripts, conversion takes under 50 ms on modern devices. Very large texts (1 MB+) may take 1‑2 seconds.

How to use

  1. Enter plain text into the Input text field.
  2. Click Escape → \uXXXX to convert to Unicode escape sequences.
  3. Click Unescape ← \uXXXX to convert escaped sequences back to readable text.
  4. Use example buttons to test with common cases (emoji, CJK, pre‑escaped strings).
  5. Copy the result with one click.

When to escape vs unescape

Scenario Action Reason
Generating JavaScript string literals from user input Escape Prevent syntax errors and ensure safe embedding.
Reading escaped data from JSON files Unescape Restore human‑readable text.
Storing text in ASCII‑only databases Escape Preserve Unicode information without binary blobs.
Debugging Unicode glyphs in logs Unescape See actual characters instead of codes.

Example transformation:
Input: Hello 世界 ?
Escaped: Hello \u4E16\u754C \uD83C\uDF0D
Unescaped back: Hello 世界 ?

Frequently Asked Questions

No, by default it only escapes characters with code points > 0x7F (non‑ASCII) plus the backslash character itself (to avoid breaking escapes). ASCII letters, digits, and common punctuation remain as‑is. This matches typical JavaScript/JSON escaping behavior (only escaping non‑ASCII and control characters). If you need to escape all characters (including ASCII), you can modify the logic, but this tool focuses on practical use cases.

They are correctly handled. The escape function detects code points > 0xFFFF and outputs the UTF‑16 surrogate pair as two \uXXXX sequences (e.g., ? → \uD83C\uDF0D). The unescape function recognizes consecutive surrogates and recombines them into a single character. This matches the standard JavaScript escape representation.

Yes. All processing happens in your browser. No network requests are made. You can verify this by opening Developer Tools (Network tab).

This tool uses the classic \uXXXX surrogate pair format, which is compatible with JSON and older JavaScript versions. The newer \u{codePoint} syntax is not supported for escape, but the unescape function will not misinterpret it — it will be left as literal text. For full ES6 support, you may need a different tool.

The unescape function expects valid surrogate pairs. If an unmatched high or low surrogate is present, it will be decoded individually (as a 4‑digit escape) resulting in a lone surrogate character, which is technically invalid UTF‑16 but the tool does not error — it outputs the lone surrogate as a character. For best results, ensure your input contains complete pairs.

Standards‑aligned Unicode utility – This tool follows the Unicode Standard (v15.0) and ECMAScript specification (ECMA‑262) for string escaping. The surrogate pair handling aligns with UTF‑16 encoding rules. Source code is transparent and can be inspected in your browser. No external libraries, no tracking. References: Unicode Character Encoding Model, ECMAScript Unicode Escape Sequences.