Schema Design Patterns: Normalization, Denormalization, Naming Conventions


Schema Design Patterns: Normalization, Denormalization, Naming Conventions

Schema design is the foundation of application performance and maintainability. Good schemas are intuitive, performant, and resilient to change. This article covers normalization trade-offs, denormalization strategies, and practical naming conventions.

Normalization

Normalization eliminates data redundancy through a series of normal forms. Most production schemas aim for Third Normal Form (3NF).

First Normal Form (1NF)

Each column contains atomic values. No repeating groups or arrays.




-- Violates 1NF: multi-valued column


CREATE TABLE orders (


id INTEGER PRIMARY KEY,


product_ids TEXT -- '1,2,3' as comma-separated values


);




-- 1NF compliant: separate rows per product


CREATE TABLE order_items (


order_id INTEGER REFERENCES orders(id),


product_id INTEGER REFERENCES products(id),


quantity INTEGER NOT NULL,


PRIMARY KEY (order_id, product_id)


);





Second Normal Form (2NF)

1NF + every non-key column depends on the whole primary key (relevant for composite keys):




-- Violates 2NF: product_name depends only on product_id, not on order_id


CREATE TABLE order_items (


order_id INTEGER,


product_id INTEGER,


product_name TEXT, -- depends on product_id only


quantity INTEGER,


PRIMARY KEY (order_id, product_id)


);




-- 2NF compliant: product_name moved to products table


CREATE TABLE products (id INTEGER PRIMARY KEY, name TEXT);





Third Normal Form (3NF)

2NF + no transitive dependencies (non-key columns depend only on the primary key):




-- Violates 3NF: category_name depends on category_id, not on order_id


CREATE TABLE products (


id INTEGER PRIMARY KEY,


name TEXT,


category_id INTEGER,


category_name TEXT -- transitively depends on category_id


);




-- 3NF compliant


CREATE TABLE categories (id INTEGER PRIMARY KEY, name TEXT);


CREATE TABLE products (id INTEGER PRIMARY KEY, name TEXT, category_id INTEGER);





Denormalization

Denormalization intentionally adds redundancy for performance. Use it judiciously.

Read-Optimized Denormalization




-- Normalized form


SELECT COUNT(*) FROM orders WHERE user_id = 42;




-- Denormalized: add order_count to users table


ALTER TABLE users ADD COLUMN order_count INTEGER DEFAULT 0;




-- Keep it in sync (via trigger or application logic)


UPDATE users SET order_count = order_count + 1 WHERE id = NEW.user_id;





Pre-Joined Tables

For read-heavy reporting, pre-join frequently accessed data:




CREATE TABLE order_summaries (


order_id INTEGER PRIMARY KEY,


user_email TEXT,


total NUMERIC,


item_count INTEGER,


first_item_name TEXT,


created_at TIMESTAMPTZ


);




-- Refresh periodically


INSERT INTO order_summaries


SELECT o.id, u.email, o.total, COUNT(oi.id),


MIN(p.name), o.created_at


FROM orders o


JOIN users u ON u.id = o.user_id


JOIN order_items oi ON oi.order_id = o.id


JOIN products p ON p.id = oi.product_id


GROUP BY o.id, u.email, o.total, o.created_at


ON CONFLICT (order_id) DO NOTHING;





Naming Conventions

Consistent naming conventions reduce cognitive load and make schemas self-documenting.

Tables


* **Plural nouns**: `users`, `orders`, `products`, `order_items`

* **Join tables**: Combine table names: `user_roles`, `product_categories`

* **Avoid reserved words**: Never name a table `user`, `order`, or `group` without quoting


Columns


* **Primary keys**: `id` (singular, generic)

* **Foreign keys**: `{table}_id` (e.g., `user_id`, `product_id`)

* **Timestamps**: `created_at`, `updated_at`, `deleted_at`

* **Boolean flags**: `is_active`, `has_billing`, `email_verified`


Indexes




-- Naming convention: idx_{table}_{column(s)}


CREATE INDEX idx_users_email ON users (email);


CREATE INDEX idx_orders_user_date ON orders (user_id, created_at);




-- Unique indexes: uq_{table}_{column(s)}


CREATE UNIQUE INDEX uq_users_email ON users (email);





Timestamp Handling

Timestamps are a common source of bugs in schema design.

Always Use TIMESTAMPTZ




CREATE TABLE events (


id BIGSERIAL PRIMARY KEY,


occurred_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),


created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),


updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW()


);





`TIMESTAMPTZ` stores values in UTC internally and converts to the session timezone on display. Using `TIMESTAMP` without timezone invites timezone-related bugs.

Automatic Updated At




CREATE OR REPLACE FUNCTION trigger_set_updated_at()


RETURNS TRIGGER AS $$


BEGIN


NEW.updated_at = NOW();


RETURN NEW;


END;


$$ LANGUAGE plpgsql;




CREATE TRIGGER set_updated_at


BEFORE UPDATE ON users


FOR EACH ROW


EXECUTE FUNCTION trigger_set_updated_at();





Soft Deletes




ALTER TABLE users ADD COLUMN deleted_at TIMESTAMPTZ;




-- Queries must always filter:


SELECT * FROM users WHERE deleted_at IS NULL;




-- Create a partial index for active users


CREATE INDEX idx_users_active ON users (email) WHERE deleted_at IS NULL;





Schema Anti-Patterns

Entity-Value-Attribute (EAV)




-- Anti-pattern: EAV makes queries painful


CREATE TABLE product_attributes (


product_id INTEGER,


attribute_name TEXT,


attribute_value TEXT,


PRIMARY KEY (product_id, attribute_name)


);





EAV requires complex pivoting for every query. Use JSONB or add columns instead.

Polymorphic Associations




-- Anti-pattern: polymorphic foreign key


CREATE TABLE comments (


id BIGSERIAL PRIMARY KEY,


commentable_type TEXT, -- 'Post' or 'Video'


commentable_id INTEGER -- No foreign key constraint!


);





Use separate join tables or inheritance patterns instead.

Oversized VARCHAR




-- Bad: arbitrary limit


CREATE TABLE users (name VARCHAR(10));




-- Better: use TEXT with application-level validation


CREATE TABLE users (name TEXT);





PostgreSQL treats `VARCHAR(n)` and `TEXT` identically internally, but arbitrary limits cause unnecessary application bugs.

Practical Schema Checklist


* [ ] Primary key on every table (`BIGSERIAL` or `UUID`)

* [ ] Foreign keys with indexes on referencing columns

* [ ] `created_at` and `updated_at` timestamps

* [ ] `NOT NULL` on columns that should never be null

* [ ] Check constraints for domain validation

* [ ] Unique constraints for business-unique columns

* [ ] Consistent naming (plurals, `_id` suffix for FKs)

* [ ] `TIMESTAMPTZ` everywhere (never `TIMESTAMP`)

* [ ] Indexes match query patterns (verified with EXPLAIN)

* [ ] Appropriate normalization level (3NF typically, denormalize knowingly)


Schema design is a long-term investment. A well-designed schema reduces bugs, makes queries faster, and makes the codebase easier for new developers to understand. Invest the time upfront; the payoff compounds over years of maintenance.