OLTP vs OLAP: Workload Optimization


OLTP (Online Transaction Processing) and OLAP (Online Analytical Processing) represent two fundamentally different approaches to database usage. They differ in data structure, query patterns, performance requirements, and optimal storage formats. Understanding these differences is essential for choosing the right database architecture.

Workload Characteristics

OLTP Characteristics

OLTP systems handle high volumes of small, short-lived transactions. Each transaction typically reads or writes a small number of rows.




User adds item to cart: INSERT INTO cart_items (user_id, product_id, qty) VALUES (42, 100, 1);


User places order: UPDATE orders SET status='placed' WHERE order_id = 5000;


User checks balance: SELECT balance FROM accounts WHERE account_id = 1234;






* **Concurrency**: Hundreds or thousands of concurrent users.

* **Latency**: Millisecond response time expected.

* **Data access**: Point queries (by primary key) and small range scans.

* **Write patterns**: Frequent inserts and updates.

* **Data volume**: Gigabytes to low terabytes per table.

* **ACID**: Transactions must be atomic and isolated.


OLAP Characteristics

OLAP systems process complex queries over large datasets to support decision-making.




-- OLAP query


SELECT


product_category,


region,


EXTRACT(quarter FROM order_date) as qtr,


SUM(quantity * unit_price) as revenue,


COUNT(DISTINCT customer_id) as unique_customers,


AVG(discount_applied) as avg_discount


FROM orders o


JOIN products p ON o.product_id = p.product_id


JOIN customers c ON o.customer_id = c.customer_id


WHERE order_date BETWEEN '2026-01-01' AND '2026-12-31'


GROUP BY product_category, region, qtr


HAVING SUM(quantity * unit_price) > 10000


ORDER BY revenue DESC;






* **Concurrency**: Few concurrent users (analysts).

* **Latency**: Seconds to minutes acceptable.

* **Data access**: Many rows scanned, few columns returned.

* **Write patterns**: Periodic bulk loads.

* **Data volume**: Terabytes to petabytes.

* **ACID**: Relaxed requirements. Snapshot isolation is often sufficient.


Row vs Column Store

Row Store (OLTP)

Row-oriented databases store all columns of a row contiguously.




Table: orders


Row 1: [101, 42, 2345, 29.99, 1, '2026-01-15']


Row 2: [102, 73, 1234, 49.99, 2, '2026-01-15']






* Fast for: `SELECT * FROM orders WHERE order_id = 101` (one row, all columns).

* Fast for: `INSERT INTO orders VALUES (...)` (one row write).

* Slow for: `SELECT SUM(amount) FROM orders` (reads all rows but needs only one column).


Column Store (OLAP)

Column-oriented databases store each column contiguously.




order_id: [101, 102, 103, 104, ...]


customer_id:[42, 73, 15, 88, ...]


amount: [29.99, 49.99, 99.99, 5.99, ...]






* Fast for: `SELECT SUM(amount) FROM orders` (reads only the amount column).

* Fast for: `SELECT region, AVG(amount) FROM orders GROUP BY region` (reads only two columns).

* Slow for: `SELECT * FROM orders WHERE order_id = 101` (reads all column files to reconstruct the row).

* Slow for: Single-row inserts (must write to multiple column files).


Indexing Strategies

OLTP Indexes

OLTP systems use indexes to make point lookups fast.




-- Primary key index: B-tree for fast lookups by ID


CREATE TABLE users (


user_id BIGINT PRIMARY KEY, -- B-tree index created automatically


email VARCHAR(255) UNIQUE, -- Another B-tree for uniqueness checks


name VARCHAR(255)


);




-- Composite index for common query pattern


CREATE INDEX idx_users_status_created


ON users (status, created_at);





B-tree indexes are the dominant index type for OLTP. They provide O(log N) lookups and support range scans. Covering indexes (containing all columns needed by a query) eliminate table lookups entirely.

OLAP Indexes

OLAP systems use different indexing strategies:

**Bitmap indexes**: For low-cardinality columns. Each value gets a bitmap where each bit indicates whether a row has that value.




-- In a columnar database, bitmap indexes are automatic for low-cardinality columns


-- Example: region column with 10 distinct values





**Zone maps**: Store min/max values for data blocks. Skip blocks that cannot contain matching data.




-- In Oracle or Redshift, block-level min/max elimination is automatic


-- The planner skips blocks where no row can match the WHERE clause





**Projections**: Pre-join and pre-aggregate data for common query patterns. Vertica uses projections as the primary storage mechanism.




-- Vertica projection


CREATE PROJECTION daily_sales_projection AS


SELECT order_date, region, SUM(amount)


FROM sales


GROUP BY order_date, region;





Query Pattern Examples

OLTP Query Pattern




Transaction: Place an order




BEGIN;


-- 1. Check inventory


SELECT stock FROM inventory WHERE product_id = 100 FOR UPDATE;


-- 2. Create order


INSERT INTO orders (customer_id, total) VALUES (42, 99.99);


-- 3. Update inventory


UPDATE inventory SET stock = stock - 1 WHERE product_id = 100;


-- 4. Create payment record


INSERT INTO payments (order_id, amount) VALUES (LASTVAL(), 99.99);


COMMIT;






* 4 queries, 1 transaction.

* Each query touches 1-2 rows.

* Total time: under 50ms.


OLAP Query Pattern




SELECT


c.segment,


COUNT(DISTINCT o.customer_id) as customers,


SUM(o.total) as revenue,


SUM(o.total) / COUNT(DISTINCT o.customer_id) as arpu


FROM orders o


JOIN customers c ON o.customer_id = c.customer_id


WHERE o.order_date BETWEEN '2026-01-01' AND '2026-06-30'


AND c.country = 'US'


GROUP BY c.segment


ORDER BY revenue DESC;






* 1 query but scans millions of rows.

* Joins two tables on non-key column.

* Aggregates and groups data.

* Total time: seconds to minutes.


Hybrid Approaches

Many systems must handle both OLTP and OLAP workloads. Several strategies address this.

Operational Data Store (ODS)

An ODS sits between OLTP and OLAP. It stores near-real-time data from operational systems and supports lightweight reporting.




OLTP -> ODS (real-time sync) -> ETL -> OLAP (Data Warehouse)





The ODS supports simple queries on recent data. Complex analytics happen in the warehouse.

Dual Databases

Run OLTP on a row-oriented database and replicate to a columnar database for analytics.




# Architecture: OLTP + OLAP independently


oltp:


database: PostgreSQL


storage: Row-oriented


purpose: Transaction processing




olap:


database: ClickHouse


storage: Column-oriented


purpose: Analytics and reporting




sync:


tool: Debezium + Kafka


mode: Change Data Capture


latency: < 1 minute





This is the most common pattern for mature systems. It provides optimal performance for both workloads at the cost of maintaining two systems.

HTAP Databases

HTAP (Hybrid Transactional/Analytical Processing) databases handle both workloads in a single system.

How HTAP Works

HTAP databases maintain both a row-oriented copy and a columnar copy of data internally. Writes go to the row store. The column store is updated asynchronously or synchronously.




-- SingleStore: Automatically routes queries


CREATE TABLE orders (


id BIGINT AUTO_INCREMENT,


customer_id INT,


amount DECIMAL(10,2),


created_at DATETIME


);


-- Writes use row store; analytics use columnar store


-- Query optimizer chooses the best engine automatically





HTAP Platforms


* **SingleStore**: Row store for transactions, columnar for analytics. Query optimizer routes queries to the appropriate engine.

* **ClickHouse with Keeper**: Supports lightweight UPDATE/DELETE alongside blazing-fast analytical queries.

* **Amazon Redshift**: Row-based for ingestion, columnar for storage. Auto-optimizes based on query patterns.

* **SAP HANA**: In-memory row and columnar stores. Used in enterprise environments for mixed workloads.


When HTAP Works

HTAP works when:

* Analytics queries are simple and touch recent data.

* Transaction volume is moderate.

* The cost of maintaining two systems is prohibitive.


HTAP struggles when:

* Analytics queries are complex and scan terabytes of historical data.

* Transaction volume is very high (the row store becomes a bottleneck for the columnar sync).

* Write-optimized and read-optimized storage are fundamentally at odds.


Choosing Your Approach

OLTP-Only

* Use a row-oriented database (PostgreSQL, MySQL, SQL Server).

* Optimize indexes for your query patterns.

* Add read replicas for read scaling.


OLAP-Only

* Use a columnar database (ClickHouse, Redshift, BigQuery).

* Design star schema for data modeling.

* Partition tables by date for efficient pruning.


Mixed Workloads with Modest Analytics

* Use an OLTP database with columnar extensions.

* PostgreSQL with pg_analytics or TimescaleDB.

* Materialized views for common aggregations.


Mixed Workloads with Heavy Analytics

* Use the dual database approach (OLTP + OLAP).

* Replicate via CDC (Debezium, Striim, Fivetran).

* Accept the operational complexity of maintaining two systems.


Conclusion

OLTP and OLAP have fundamentally different requirements. OLTP needs fast point queries, high concurrency, and ACID transactions. OLAP needs fast scans of many rows, efficient aggregation, and columnar storage. The best approach for mixed workloads is typically a dual database architecture with CDC replication. HTAP databases are improving but remain a compromise for demanding workloads. Choose your database architecture based on your dominant workload pattern and accept that running both patterns optimally in one system is inherently challenging.