Input Validation Deep Dive
Introduction
Input validation is the first line of defense against injection attacks. Every piece of data entering an application — form fields, HTTP headers, URL parameters, file uploads, API payloads — must be validated before processing. The principle is simple: never trust user input.
Whitelist vs Blacklist
Whitelist (Allowlist) Validation
Whitelist validation defines what is allowed and rejects everything else. It is far more secure than blacklisting.
import re
# Whitelist: only allow specific characters
def validate_username_whitelist(username):
"""Allow only alphanumeric, underscore, and hyphen."""
pattern = r'^[a-zA-Z0-9_-]{3,32}$'
if not re.match(pattern, username):
raise ValueError(
f"Username '{username}' contains invalid characters. "
"Only letters, numbers, underscores, and hyphens are allowed."
)
return username
# Whitelist for country codes
ALLOWED_COUNTRIES = {'US', 'CA', 'GB', 'DE', 'FR', 'JP'}
def validate_country_code(code):
if code.upper() not in ALLOWED_COUNTRIES:
raise ValueError(f"Country '{code}' is not in the allowed list")
return code.upper()
Blacklist (Blocklist) Validation
Blacklist validation attempts to block known malicious patterns. It is inherently fragile because attackers constantly discover new bypass techniques.
# WEAK: Blacklist approach (easily bypassed)
def validate_input_blacklist(input_string):
# Easily bypassed — attacker uses alternative syntax
blocklist = ['