Database Normalization Explained


What is Normalization?

Normalization organizes data to reduce redundancy and improve integrity. Each piece of data is stored in exactly one place.

Normal Forms

First Normal Form (1NF)

Each column contains atomic values. Each row is unique.




-- Violates 1NF


CREATE TABLE orders (id SERIAL, products TEXT); -- "Laptop,Mouse"




-- 1NF compliant


CREATE TABLE order_items (


order_id INT REFERENCES orders(id),


product_name TEXT,


PRIMARY KEY (order_id, product_name)


);





Second Normal Form (2NF)

Must be in 1NF. All non-key columns depend on the entire primary key.




-- Violates 2NF: product_name depends only on product_id, not order_id


CREATE TABLE order_details (


order_id INT, product_id INT,


product_name TEXT, quantity INT,


PRIMARY KEY (order_id, product_id)


);





Third Normal Form (3NF)

Must be in 2NF. No transitive dependencies (non-key depends on non-key).




-- Violates 3NF: customer_name depends on customer_id


CREATE TABLE orders (id SERIAL, customer_id INT, customer_name TEXT);





BCNF through 5NF

BCNF: Every determinant is a candidate key. 4NF: No multi-valued dependencies. 5NF: Join dependencies only from candidate keys.

Practical Guidance

Most production databases aim for 3NF:


* Design in 3NF

2\. Profile performance 3\. Denormalize selectively for performance 4\. Document all denormalization decisions

Denormalization

Denormalize for: reporting, high-read/low-write workloads, distributed databases, cached aggregates.

Conclusion

Normalize to 3NF for transactional integrity. Denormalize selectively for performance. Document your decisions. The key is "the key, the whole key, and nothing but the key."