OLTP vs OLAP: Workload Optimization
OLTP (Online Transaction Processing) and OLAP (Online Analytical Processing) represent two fundamentally different approaches to database usage. They differ in data structure, query patterns, performance requirements, and optimal storage formats. Understanding these differences is essential for choosing the right database architecture.
Workload Characteristics
OLTP Characteristics
OLTP systems handle high volumes of small, short-lived transactions. Each transaction typically reads or writes a small number of rows.
User adds item to cart: INSERT INTO cart_items (user_id, product_id, qty) VALUES (42, 100, 1);
User places order: UPDATE orders SET status='placed' WHERE order_id = 5000;
User checks balance: SELECT balance FROM accounts WHERE account_id = 1234;
* **Concurrency**: Hundreds or thousands of concurrent users.
* **Latency**: Millisecond response time expected.
* **Data access**: Point queries (by primary key) and small range scans.
* **Write patterns**: Frequent inserts and updates.
* **Data volume**: Gigabytes to low terabytes per table.
* **ACID**: Transactions must be atomic and isolated.
OLAP Characteristics
OLAP systems process complex queries over large datasets to support decision-making.
-- OLAP query
SELECT
product_category,
region,
EXTRACT(quarter FROM order_date) as qtr,
SUM(quantity * unit_price) as revenue,
COUNT(DISTINCT customer_id) as unique_customers,
AVG(discount_applied) as avg_discount
FROM orders o
JOIN products p ON o.product_id = p.product_id
JOIN customers c ON o.customer_id = c.customer_id
WHERE order_date BETWEEN '2026-01-01' AND '2026-12-31'
GROUP BY product_category, region, qtr
HAVING SUM(quantity * unit_price) > 10000
ORDER BY revenue DESC;
* **Concurrency**: Few concurrent users (analysts).
* **Latency**: Seconds to minutes acceptable.
* **Data access**: Many rows scanned, few columns returned.
* **Write patterns**: Periodic bulk loads.
* **Data volume**: Terabytes to petabytes.
* **ACID**: Relaxed requirements. Snapshot isolation is often sufficient.
Row vs Column Store
Row Store (OLTP)
Row-oriented databases store all columns of a row contiguously.
Table: orders
Row 1: [101, 42, 2345, 29.99, 1, '2026-01-15']
Row 2: [102, 73, 1234, 49.99, 2, '2026-01-15']
* Fast for: `SELECT * FROM orders WHERE order_id = 101` (one row, all columns).
* Fast for: `INSERT INTO orders VALUES (...)` (one row write).
* Slow for: `SELECT SUM(amount) FROM orders` (reads all rows but needs only one column).
Column Store (OLAP)
Column-oriented databases store each column contiguously.
order_id: [101, 102, 103, 104, ...]
customer_id:[42, 73, 15, 88, ...]
amount: [29.99, 49.99, 99.99, 5.99, ...]
* Fast for: `SELECT SUM(amount) FROM orders` (reads only the amount column).
* Fast for: `SELECT region, AVG(amount) FROM orders GROUP BY region` (reads only two columns).
* Slow for: `SELECT * FROM orders WHERE order_id = 101` (reads all column files to reconstruct the row).
* Slow for: Single-row inserts (must write to multiple column files).
Indexing Strategies
OLTP Indexes
OLTP systems use indexes to make point lookups fast.
-- Primary key index: B-tree for fast lookups by ID
CREATE TABLE users (
user_id BIGINT PRIMARY KEY, -- B-tree index created automatically
email VARCHAR(255) UNIQUE, -- Another B-tree for uniqueness checks
name VARCHAR(255)
);
-- Composite index for common query pattern
CREATE INDEX idx_users_status_created
ON users (status, created_at);
B-tree indexes are the dominant index type for OLTP. They provide O(log N) lookups and support range scans. Covering indexes (containing all columns needed by a query) eliminate table lookups entirely.
OLAP Indexes
OLAP systems use different indexing strategies:
**Bitmap indexes**: For low-cardinality columns. Each value gets a bitmap where each bit indicates whether a row has that value.
-- In a columnar database, bitmap indexes are automatic for low-cardinality columns
-- Example: region column with 10 distinct values
**Zone maps**: Store min/max values for data blocks. Skip blocks that cannot contain matching data.
-- In Oracle or Redshift, block-level min/max elimination is automatic
-- The planner skips blocks where no row can match the WHERE clause
**Projections**: Pre-join and pre-aggregate data for common query patterns. Vertica uses projections as the primary storage mechanism.
-- Vertica projection
CREATE PROJECTION daily_sales_projection AS
SELECT order_date, region, SUM(amount)
FROM sales
GROUP BY order_date, region;
Query Pattern Examples
OLTP Query Pattern
Transaction: Place an order
BEGIN;
-- 1. Check inventory
SELECT stock FROM inventory WHERE product_id = 100 FOR UPDATE;
-- 2. Create order
INSERT INTO orders (customer_id, total) VALUES (42, 99.99);
-- 3. Update inventory
UPDATE inventory SET stock = stock - 1 WHERE product_id = 100;
-- 4. Create payment record
INSERT INTO payments (order_id, amount) VALUES (LASTVAL(), 99.99);
COMMIT;
* 4 queries, 1 transaction.
* Each query touches 1-2 rows.
* Total time: under 50ms.
OLAP Query Pattern
SELECT
c.segment,
COUNT(DISTINCT o.customer_id) as customers,
SUM(o.total) as revenue,
SUM(o.total) / COUNT(DISTINCT o.customer_id) as arpu
FROM orders o
JOIN customers c ON o.customer_id = c.customer_id
WHERE o.order_date BETWEEN '2026-01-01' AND '2026-06-30'
AND c.country = 'US'
GROUP BY c.segment
ORDER BY revenue DESC;
* 1 query but scans millions of rows.
* Joins two tables on non-key column.
* Aggregates and groups data.
* Total time: seconds to minutes.
Hybrid Approaches
Many systems must handle both OLTP and OLAP workloads. Several strategies address this.
Operational Data Store (ODS)
An ODS sits between OLTP and OLAP. It stores near-real-time data from operational systems and supports lightweight reporting.
OLTP -> ODS (real-time sync) -> ETL -> OLAP (Data Warehouse)
The ODS supports simple queries on recent data. Complex analytics happen in the warehouse.
Dual Databases
Run OLTP on a row-oriented database and replicate to a columnar database for analytics.
# Architecture: OLTP + OLAP independently
oltp:
database: PostgreSQL
storage: Row-oriented
purpose: Transaction processing
olap:
database: ClickHouse
storage: Column-oriented
purpose: Analytics and reporting
sync:
tool: Debezium + Kafka
mode: Change Data Capture
latency: < 1 minute
This is the most common pattern for mature systems. It provides optimal performance for both workloads at the cost of maintaining two systems.
HTAP Databases
HTAP (Hybrid Transactional/Analytical Processing) databases handle both workloads in a single system.
How HTAP Works
HTAP databases maintain both a row-oriented copy and a columnar copy of data internally. Writes go to the row store. The column store is updated asynchronously or synchronously.
-- SingleStore: Automatically routes queries
CREATE TABLE orders (
id BIGINT AUTO_INCREMENT,
customer_id INT,
amount DECIMAL(10,2),
created_at DATETIME
);
-- Writes use row store; analytics use columnar store
-- Query optimizer chooses the best engine automatically
HTAP Platforms
* **SingleStore**: Row store for transactions, columnar for analytics. Query optimizer routes queries to the appropriate engine.
* **ClickHouse with Keeper**: Supports lightweight UPDATE/DELETE alongside blazing-fast analytical queries.
* **Amazon Redshift**: Row-based for ingestion, columnar for storage. Auto-optimizes based on query patterns.
* **SAP HANA**: In-memory row and columnar stores. Used in enterprise environments for mixed workloads.
When HTAP Works
HTAP works when:
* Analytics queries are simple and touch recent data.
* Transaction volume is moderate.
* The cost of maintaining two systems is prohibitive.
HTAP struggles when:
* Analytics queries are complex and scan terabytes of historical data.
* Transaction volume is very high (the row store becomes a bottleneck for the columnar sync).
* Write-optimized and read-optimized storage are fundamentally at odds.
Choosing Your Approach
OLTP-Only
* Use a row-oriented database (PostgreSQL, MySQL, SQL Server).
* Optimize indexes for your query patterns.
* Add read replicas for read scaling.
OLAP-Only
* Use a columnar database (ClickHouse, Redshift, BigQuery).
* Design star schema for data modeling.
* Partition tables by date for efficient pruning.
Mixed Workloads with Modest Analytics
* Use an OLTP database with columnar extensions.
* PostgreSQL with pg_analytics or TimescaleDB.
* Materialized views for common aggregations.
Mixed Workloads with Heavy Analytics
* Use the dual database approach (OLTP + OLAP).
* Replicate via CDC (Debezium, Striim, Fivetran).
* Accept the operational complexity of maintaining two systems.
Conclusion
OLTP and OLAP have fundamentally different requirements. OLTP needs fast point queries, high concurrency, and ACID transactions. OLAP needs fast scans of many rows, efficient aggregation, and columnar storage. The best approach for mixed workloads is typically a dual database architecture with CDC replication. HTAP databases are improving but remain a compromise for demanding workloads. Choose your database architecture based on your dominant workload pattern and accept that running both patterns optimally in one system is inherently challenging.