Schema Design Patterns: Normalization, Denormalization, Naming Conventions

Schema design is the foundation of application performance and maintainability. Good schemas are intuitive, performant, and resilient to change. This article covers normalization trade-offs, denormalization strategies, and practical naming conventions.

Normalization

Normalization eliminates data redundancy through a series of normal forms. Most production schemas aim for Third Normal Form (3NF).

First Normal Form (1NF)

Each column contains atomic values. No repeating groups or arrays.

-- Violates 1NF: multi-valued column

CREATE TABLE orders (

id INTEGER PRIMARY KEY,

product_ids TEXT -- '1,2,3' as comma-separated values

);

-- 1NF compliant: separate rows per product

CREATE TABLE order_items (

order_id INTEGER REFERENCES orders(id),

product_id INTEGER REFERENCES products(id),

quantity INTEGER NOT NULL,

PRIMARY KEY (order_id, product_id)

);

Second Normal Form (2NF)

1NF + every non-key column depends on the whole primary key (relevant for composite keys):

-- Violates 2NF: product_name depends only on product_id, not on order_id

CREATE TABLE order_items (

order_id INTEGER,

product_id INTEGER,

product_name TEXT, -- depends on product_id only

quantity INTEGER,

PRIMARY KEY (order_id, product_id)

);

-- 2NF compliant: product_name moved to products table

CREATE TABLE products (id INTEGER PRIMARY KEY, name TEXT);

Third Normal Form (3NF)

2NF + no transitive dependencies (non-key columns depend only on the primary key):

-- Violates 3NF: category_name depends on category_id, not on order_id

CREATE TABLE products (

id INTEGER PRIMARY KEY,

name TEXT,

category_id INTEGER,

category_name TEXT -- transitively depends on category_id

);

-- 3NF compliant

CREATE TABLE categories (id INTEGER PRIMARY KEY, name TEXT);

CREATE TABLE products (id INTEGER PRIMARY KEY, name TEXT, category_id INTEGER);

Denormalization

Denormalization intentionally adds redundancy for performance. Use it judiciously.

Read-Optimized Denormalization

-- Normalized form

SELECT COUNT(*) FROM orders WHERE user_id = 42;

-- Denormalized: add order_count to users table

ALTER TABLE users ADD COLUMN order_count INTEGER DEFAULT 0;

-- Keep it in sync (via trigger or application logic)

UPDATE users SET order_count = order_count + 1 WHERE id = NEW.user_id;

Pre-Joined Tables

For read-heavy reporting, pre-join frequently accessed data:

CREATE TABLE order_summaries (

order_id INTEGER PRIMARY KEY,

user_email TEXT,

total NUMERIC,

item_count INTEGER,

first_item_name TEXT,

created_at TIMESTAMPTZ

);

-- Refresh periodically

INSERT INTO order_summaries

SELECT o.id, u.email, o.total, COUNT(oi.id),

MIN(p.name), o.created_at

FROM orders o

JOIN users u ON u.id = o.user_id

JOIN order_items oi ON oi.order_id = o.id

JOIN products p ON p.id = oi.product_id

GROUP BY o.id, u.email, o.total, o.created_at

ON CONFLICT (order_id) DO NOTHING;

Naming Conventions

Consistent naming conventions reduce cognitive load and make schemas self-documenting.

Tables

* **Plural nouns**: `users`, `orders`, `products`, `order_items`

* **Join tables**: Combine table names: `user_roles`, `product_categories`

* **Avoid reserved words**: Never name a table `user`, `order`, or `group` without quoting

Columns

* **Primary keys**: `id` (singular, generic)

* **Foreign keys**: `{table}_id` (e.g., `user_id`, `product_id`)

* **Timestamps**: `created_at`, `updated_at`, `deleted_at`

* **Boolean flags**: `is_active`, `has_billing`, `email_verified`

Indexes

-- Naming convention: idx_{table}_{column(s)}

CREATE INDEX idx_users_email ON users (email);

CREATE INDEX idx_orders_user_date ON orders (user_id, created_at);

-- Unique indexes: uq_{table}_{column(s)}

CREATE UNIQUE INDEX uq_users_email ON users (email);

Timestamp Handling

Timestamps are a common source of bugs in schema design.

Always Use TIMESTAMPTZ

CREATE TABLE events (

id BIGSERIAL PRIMARY KEY,

occurred_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),

created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),

updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW()

);

`TIMESTAMPTZ` stores values in UTC internally and converts to the session timezone on display. Using `TIMESTAMP` without timezone invites timezone-related bugs.

Automatic Updated At

CREATE OR REPLACE FUNCTION trigger_set_updated_at()

RETURNS TRIGGER AS $$

BEGIN

NEW.updated_at = NOW();

RETURN NEW;

END;

$$ LANGUAGE plpgsql;

CREATE TRIGGER set_updated_at

BEFORE UPDATE ON users

FOR EACH ROW

EXECUTE FUNCTION trigger_set_updated_at();

Soft Deletes

ALTER TABLE users ADD COLUMN deleted_at TIMESTAMPTZ;

-- Queries must always filter:

SELECT * FROM users WHERE deleted_at IS NULL;

-- Create a partial index for active users

CREATE INDEX idx_users_active ON users (email) WHERE deleted_at IS NULL;

Schema Anti-Patterns

Entity-Value-Attribute (EAV)

-- Anti-pattern: EAV makes queries painful

CREATE TABLE product_attributes (

product_id INTEGER,

attribute_name TEXT,

attribute_value TEXT,

PRIMARY KEY (product_id, attribute_name)

);

EAV requires complex pivoting for every query. Use JSONB or add columns instead.

Polymorphic Associations

-- Anti-pattern: polymorphic foreign key

CREATE TABLE comments (

id BIGSERIAL PRIMARY KEY,

commentable_type TEXT, -- 'Post' or 'Video'

commentable_id INTEGER -- No foreign key constraint!

);

Use separate join tables or inheritance patterns instead.

Oversized VARCHAR

-- Bad: arbitrary limit

CREATE TABLE users (name VARCHAR(10));

-- Better: use TEXT with application-level validation

CREATE TABLE users (name TEXT);

PostgreSQL treats `VARCHAR(n)` and `TEXT` identically internally, but arbitrary limits cause unnecessary application bugs.

Practical Schema Checklist

* [ ] Primary key on every table (`BIGSERIAL` or `UUID`)

* [ ] Foreign keys with indexes on referencing columns

* [ ] `created_at` and `updated_at` timestamps

* [ ] `NOT NULL` on columns that should never be null

* [ ] Check constraints for domain validation

* [ ] Unique constraints for business-unique columns

* [ ] Consistent naming (plurals, `_id` suffix for FKs)

* [ ] `TIMESTAMPTZ` everywhere (never `TIMESTAMP`)

* [ ] Indexes match query patterns (verified with EXPLAIN)

* [ ] Appropriate normalization level (3NF typically, denormalize knowingly)

Schema design is a long-term investment. A well-designed schema reduces bugs, makes queries faster, and makes the codebase easier for new developers to understand. Invest the time upfront; the payoff compounds over years of maintenance.

Schema Design Patterns: Normalization, Denormalization, Naming Conventions

Schema Design Patterns: Normalization, Denormalization, Naming Conventions

Related Articles