Input Validation Deep Dive


Introduction

Input validation is the first line of defense against injection attacks. Every piece of data entering an application — form fields, HTTP headers, URL parameters, file uploads, API payloads — must be validated before processing. The principle is simple: never trust user input.

Whitelist vs Blacklist

Whitelist (Allowlist) Validation

Whitelist validation defines what is allowed and rejects everything else. It is far more secure than blacklisting.




import re




# Whitelist: only allow specific characters


def validate_username_whitelist(username):


"""Allow only alphanumeric, underscore, and hyphen."""


pattern = r'^[a-zA-Z0-9_-]{3,32}$'


if not re.match(pattern, username):


raise ValueError(


f"Username '{username}' contains invalid characters. "


"Only letters, numbers, underscores, and hyphens are allowed."


)


return username




# Whitelist for country codes


ALLOWED_COUNTRIES = {'US', 'CA', 'GB', 'DE', 'FR', 'JP'}




def validate_country_code(code):


if code.upper() not in ALLOWED_COUNTRIES:


raise ValueError(f"Country '{code}' is not in the allowed list")


return code.upper()





Blacklist (Blocklist) Validation

Blacklist validation attempts to block known malicious patterns. It is inherently fragile because attackers constantly discover new bypass techniques.




# WEAK: Blacklist approach (easily bypassed)


def validate_input_blacklist(input_string):


# Easily bypassed — attacker uses alternative syntax


blocklist = ['