Convert special characters to their corresponding HTML entities and back. Essential for preventing cross-site scripting (XSS), displaying code snippets, and ensuring valid HTML output. Real-time conversion with advanced options.
HTML entities are sequences of characters that represent reserved characters or special symbols in HTML markup. They begin with an ampersand (&) and end with a semicolon (;). For instance, < represents < and © represents ©. The HTML Entity Encoder/Decoder is a vital tool for web developers, security engineers, and content editors.
Unsanitized user input can lead to Cross-Site Scripting (XSS) attacks — one of the OWASP Top 10 vulnerabilities. By converting characters like < and > into < and >, you neutralize malicious scripts. According to OWASP guidelines, contextual output encoding is the primary defense against injection attacks.
& → &, < → <, > → >, " → ", ' → '. Extended mode additionally maps non-ASCII characters (e.g., Unicode beyond 127) to numeric entities like ©.
©), numeric decimal (©), and hexadecimal entities (©) back to raw characters using a sandboxed DOM approach.
Understanding how this tool handles complex scenarios will help you avoid common pitfalls:
"A & B & C" becomes "A & B & C", ensuring the existing & is preserved as literal text.
entity is preserved as-is during encoding, as it's already a valid HTML entity. Decoding will convert it back to a non-breaking space character.
< (which represents <) won't become &lt; unless you specifically re-encode.
| Character | Entity Name | Numeric Entity | Description |
|---|---|---|---|
| & | & | & | Ampersand |
| < | < | < | Less than |
| > | > | > | Greater than |
| " | " | " | Double quote |
| ' | ' | ' | Apostrophe |
| © | © | © | Copyright |
| ® | ® | ® | Registered trademark |
If you need to implement HTML entity encoding/decoding in your own applications, here are reference implementations in common languages:
// Basic HTML entity encoding (like our tool) function encodeHTML(text) { return text.replace(/[&<>"']/g, function(char) { if (char === '&') return '&'; if (char === '<') return '<'; if (char === '>') return '>'; if (char === '"') return '"'; if (char === "'") return '''; return char; }); } // Built-in browser decoding (safer than regex) function decodeHTML(text) { const textarea = document.createElement('textarea'); textarea.innerHTML = text; return textarea.value; }
import html # Python's standard library provides robust functions text = '' # Encoding (escapes <, >, &, ", ') encoded = html.escape(text) # Result: <script>alert("XSS")</script> # Decoding (handles all named and numeric entities) decoded = html.unescape('© 2025 Company & Co.') # Result: © 2025 Company & Co.
// PHP's built-in functions for HTML entities // Encoding (flags: ENT_QUOTES | ENT_HTML5 covers all contexts) $encoded = htmlspecialchars('<div id="test">& more</div>', ENT_QUOTES | ENT_HTML5, 'UTF-8'); // Decoding $decoded = htmlspecialchars_decode('<div>Hello</div>', ENT_QUOTES | ENT_HTML5); // For all entities including ©, ®, etc. $all_encoded = htmlentities('© 2025 Price > $10', ENT_QUOTES | ENT_HTML5, 'UTF-8'); $all_decoded = html_entity_decode($all_encoded, ENT_QUOTES | ENT_HTML5, 'UTF-8');
Just as the orthocenter, centroid and circumcenter align on the Euler line, good web hygiene aligns three pillars: input validation, output encoding, and CSP headers. HTML entity encoding is the second pillar — it breaks the malicious alignment of untrusted data and executable context.
HTML Entity Encoding is for rendering context safety — it prevents HTML/XML injection by converting special characters to entities (< → <). Use it when outputting text into HTML/XML documents.
URL Encoding (Percent Encoding) is for URL/query parameter safety — it replaces unsafe URL characters with % followed by hexadecimal codes (space → %20). Use it for query strings, path segments, and URL components.
Base64 Encoding is for binary-to-text conversion — it represents binary data as ASCII text using 64 characters. Use it for embedding images in HTML/CSS, basic authentication, or transmitting binary via text-only protocols.
Key difference: HTML encoding is about preventing interpretation (XSS defense), while URL encoding is about structural correctness of URLs, and Base64 is about data representation.
Our encoder is designed to be idempotent-safe for ampersands. It always encodes raw & characters to & first, but preserves already-encoded entities like <, ©, etc.
Example behavior:
A & B & C
A & B & C
The first & becomes &. The existing & (which represents a literal &) is left unchanged, preventing double encoding. This ensures that decoding returns the original string exactly.
Why this matters: This prevents the common issue where encoding an already-encoded string creates unreadable output like &lt; for <.