Schema Design Patterns: Normalization, Denormalization, Naming Conventions
Schema Design Patterns: Normalization, Denormalization, Naming Conventions
Schema design is the foundation of application performance and maintainability. Good schemas are intuitive, performant, and resilient to change. This article covers normalization trade-offs, denormalization strategies, and practical naming conventions.
Normalization
Normalization eliminates data redundancy through a series of normal forms. Most production schemas aim for Third Normal Form (3NF).
First Normal Form (1NF)
Each column contains atomic values. No repeating groups or arrays.
-- Violates 1NF: multi-valued column
CREATE TABLE orders (
id INTEGER PRIMARY KEY,
product_ids TEXT -- '1,2,3' as comma-separated values
);
-- 1NF compliant: separate rows per product
CREATE TABLE order_items (
order_id INTEGER REFERENCES orders(id),
product_id INTEGER REFERENCES products(id),
quantity INTEGER NOT NULL,
PRIMARY KEY (order_id, product_id)
);
Second Normal Form (2NF)
1NF + every non-key column depends on the whole primary key (relevant for composite keys):
-- Violates 2NF: product_name depends only on product_id, not on order_id
CREATE TABLE order_items (
order_id INTEGER,
product_id INTEGER,
product_name TEXT, -- depends on product_id only
quantity INTEGER,
PRIMARY KEY (order_id, product_id)
);
-- 2NF compliant: product_name moved to products table
CREATE TABLE products (id INTEGER PRIMARY KEY, name TEXT);
Third Normal Form (3NF)
2NF + no transitive dependencies (non-key columns depend only on the primary key):
-- Violates 3NF: category_name depends on category_id, not on order_id
CREATE TABLE products (
id INTEGER PRIMARY KEY,
name TEXT,
category_id INTEGER,
category_name TEXT -- transitively depends on category_id
);
-- 3NF compliant
CREATE TABLE categories (id INTEGER PRIMARY KEY, name TEXT);
CREATE TABLE products (id INTEGER PRIMARY KEY, name TEXT, category_id INTEGER);
Denormalization
Denormalization intentionally adds redundancy for performance. Use it judiciously.
Read-Optimized Denormalization
-- Normalized form
SELECT COUNT(*) FROM orders WHERE user_id = 42;
-- Denormalized: add order_count to users table
ALTER TABLE users ADD COLUMN order_count INTEGER DEFAULT 0;
-- Keep it in sync (via trigger or application logic)
UPDATE users SET order_count = order_count + 1 WHERE id = NEW.user_id;
Pre-Joined Tables
For read-heavy reporting, pre-join frequently accessed data:
CREATE TABLE order_summaries (
order_id INTEGER PRIMARY KEY,
user_email TEXT,
total NUMERIC,
item_count INTEGER,
first_item_name TEXT,
created_at TIMESTAMPTZ
);
-- Refresh periodically
INSERT INTO order_summaries
SELECT o.id, u.email, o.total, COUNT(oi.id),
MIN(p.name), o.created_at
FROM orders o
JOIN users u ON u.id = o.user_id
JOIN order_items oi ON oi.order_id = o.id
JOIN products p ON p.id = oi.product_id
GROUP BY o.id, u.email, o.total, o.created_at
ON CONFLICT (order_id) DO NOTHING;
Naming Conventions
Consistent naming conventions reduce cognitive load and make schemas self-documenting.
Tables
* **Plural nouns**: `users`, `orders`, `products`, `order_items`
* **Join tables**: Combine table names: `user_roles`, `product_categories`
* **Avoid reserved words**: Never name a table `user`, `order`, or `group` without quoting
Columns
* **Primary keys**: `id` (singular, generic)
* **Foreign keys**: `{table}_id` (e.g., `user_id`, `product_id`)
* **Timestamps**: `created_at`, `updated_at`, `deleted_at`
* **Boolean flags**: `is_active`, `has_billing`, `email_verified`
Indexes
-- Naming convention: idx_{table}_{column(s)}
CREATE INDEX idx_users_email ON users (email);
CREATE INDEX idx_orders_user_date ON orders (user_id, created_at);
-- Unique indexes: uq_{table}_{column(s)}
CREATE UNIQUE INDEX uq_users_email ON users (email);
Timestamp Handling
Timestamps are a common source of bugs in schema design.
Always Use TIMESTAMPTZ
CREATE TABLE events (
id BIGSERIAL PRIMARY KEY,
occurred_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);
`TIMESTAMPTZ` stores values in UTC internally and converts to the session timezone on display. Using `TIMESTAMP` without timezone invites timezone-related bugs.
Automatic Updated At
CREATE OR REPLACE FUNCTION trigger_set_updated_at()
RETURNS TRIGGER AS $$
BEGIN
NEW.updated_at = NOW();
RETURN NEW;
END;
$$ LANGUAGE plpgsql;
CREATE TRIGGER set_updated_at
BEFORE UPDATE ON users
FOR EACH ROW
EXECUTE FUNCTION trigger_set_updated_at();
Soft Deletes
ALTER TABLE users ADD COLUMN deleted_at TIMESTAMPTZ;
-- Queries must always filter:
SELECT * FROM users WHERE deleted_at IS NULL;
-- Create a partial index for active users
CREATE INDEX idx_users_active ON users (email) WHERE deleted_at IS NULL;
Schema Anti-Patterns
Entity-Value-Attribute (EAV)
-- Anti-pattern: EAV makes queries painful
CREATE TABLE product_attributes (
product_id INTEGER,
attribute_name TEXT,
attribute_value TEXT,
PRIMARY KEY (product_id, attribute_name)
);
EAV requires complex pivoting for every query. Use JSONB or add columns instead.
Polymorphic Associations
-- Anti-pattern: polymorphic foreign key
CREATE TABLE comments (
id BIGSERIAL PRIMARY KEY,
commentable_type TEXT, -- 'Post' or 'Video'
commentable_id INTEGER -- No foreign key constraint!
);
Use separate join tables or inheritance patterns instead.
Oversized VARCHAR
-- Bad: arbitrary limit
CREATE TABLE users (name VARCHAR(10));
-- Better: use TEXT with application-level validation
CREATE TABLE users (name TEXT);
PostgreSQL treats `VARCHAR(n)` and `TEXT` identically internally, but arbitrary limits cause unnecessary application bugs.
Practical Schema Checklist
* [ ] Primary key on every table (`BIGSERIAL` or `UUID`)
* [ ] Foreign keys with indexes on referencing columns
* [ ] `created_at` and `updated_at` timestamps
* [ ] `NOT NULL` on columns that should never be null
* [ ] Check constraints for domain validation
* [ ] Unique constraints for business-unique columns
* [ ] Consistent naming (plurals, `_id` suffix for FKs)
* [ ] `TIMESTAMPTZ` everywhere (never `TIMESTAMP`)
* [ ] Indexes match query patterns (verified with EXPLAIN)
* [ ] Appropriate normalization level (3NF typically, denormalize knowingly)
Schema design is a long-term investment. A well-designed schema reduces bugs, makes queries faster, and makes the codebase easier for new developers to understand. Invest the time upfront; the payoff compounds over years of maintenance.