Output Encoding
Introduction
Output encoding transforms data before rendering it in a specific context to prevent injection attacks. While input validation filters dangerous content, output encoding ensures that even if dangerous characters slip through, they are rendered as data rather than code. Context-sensitive encoding is critical — the same string must be encoded differently depending on whether it appears in HTML, JavaScript, CSS, or a URL.
Context-Sensitive Encoding
Different output contexts require different encoding strategies. Applying the wrong encoding is equivalent to no encoding at all.
| Context | Example | Encoding Method | Risk if Wrong | |---------|---------|----------------|---------------| | HTML Body | `USER` | HTML entity encode | HTML injection | | HTML Attribute | `` | Attribute encode | Attribute break-out | | JavaScript | `` | JavaScript string encode | XSS | | URL | `[` | URL encode | Open redirect, XSS | | CSS | `` | CSS hex encode | CSS injection |
HTML Body Context
import html
def encode_html_body(user_input):
"""Encode for HTML element content context."""
encoded = html.escape(user_input, quote=True)
# becomes
# <script>alert(1)</script>
return encoded
HTML Attribute Context
def encode_html_attribute(user_input):
"""Encode for HTML attribute values."""
# More aggressive than body encoding
replacements = {
'&': '&',
'"': '"',
"'": ''',
'<': '<',
'>': '>',
'/': '/', # Prevents attribute closing
'`': '`', # Backtick can close attributes in some browsers
}
for char, replacement in replacements.items():
user_input = user_input.replace(char, replacement)
return user_input
JavaScript Context
import json
import re
def encode_javascript_string(user_input):
"""Encode for JavaScript string context."""
# JSON encoding is safe for JS string literals
encoded = json.dumps(user_input, ensure_ascii=False)
# Additional hardening for in script blocks
encoded = encoded.replace('', '<\\/')
return encoded
def encode_javascript_variable(user_input):
"""Hex encode each character for JS variable context."""
encoded = ''
for char in user_input:
encoded += f'\\x{ord(char):02x}'
return encoded
URL Context
from urllib.parse import quote, urlencode
def encode_url_param(user_input):
"""Encode for URL parameter values."""
return quote(user_input, safe='')
def encode_url_path(component):
"""Encode for URL path segments."""
return quote(component, safe='')
# Defense against javascript: protocol
def sanitize_url_for_href(user_input):
"""Block dangerous URL schemes."""
allowed_schemes = {'http', 'https', 'mailto', 'tel'}
user_input = user_input.strip()
# Lowercase for scheme check
for scheme in allowed_schemes:
if user_input.lower().startswith(f'{scheme}:'):
return quote(user_input, safe=':/?#[]@!$&\'()*+,;=-._~')
# Dangerous schemes
dangerous = ['javascript:', 'data:', 'vbscript:', 'file:']
for scheme in dangerous:
if scheme in user_input.lower():
return '' # Remove dangerous URLs
return quote(user_input, safe=':/?#[]@!$&\'()*+,;=-._~')
Template Engine Auto-Escaping
Modern template engines provide auto-escaping, which encodes output based on context.
Jinja2 (Python)
{# Jinja2 auto-escape enabled by default #}
{{ user.bio }}
{#
Hello <script>alert(1)</script>
#}{# Safe filter marks as trusted HTML #}
{{ user.bio|safe }}
{# WARNING: only use safe for trusted content #}
{# Escaping for specific attributes #}
{# Manual escaping functions #}
{% set encoded = user.name|e %}
React JSX
// React automatically escapes by default
function UserProfile({ user }) {
return (
{/* Automatic escaping — XSS safe */}
{user.bio}
{/* DangerouslySetInnerHTML bypasses escaping */}
{/* Manual URL encoding for href */}
);
}
Common Pitfalls
Double Encoding
Input:
After double URL decode: %253Cscript%253E → %3Cscript%3E →
Solution: Decode input only once at boundary, then encode for output context.
Encoding for Wrong Context
# WRONG: URL encoding for HTML body
user_input = "https://evil.com?q=