Saga Choreography Pattern


Saga choreography distributes saga execution across participating services through event-driven coordination. There is no central coordinator — each service performs its local transaction, publishes an event, and subscribes to events that trigger its next action. This decentralized approach maximizes autonomy and minimizes coupling, making it attractive for systems where team independence and service evolution are paramount.

In a choreographed saga, each service owns its part of the workflow. When the Order Service creates an order, it publishes OrderCreated. The Payment Service subscribes to OrderCreated, processes payment, and publishes PaymentProcessed. The Inventory Service subscribes to PaymentProcessed, reserves stock, and publishes InventoryReserved. The Shipping Service subscribes to InventoryReserved and schedules shipment. The workflow emerges from these chains of reactions.

The strength of choreography is that each service only knows about its own domain events. New services can be added by subscribing to existing events and publishing new ones — no existing service needs to change. This makes choreography highly extensible and aligned with bounded context boundaries. Each team can evolve its service independently as long as event contracts remain compatible.

Error handling in choreography is decentralized but nontrivial. If the Payment Service fails, it publishes PaymentFailed. The Inventory Service must subscribe to PaymentFailed and compensate by releasing the reserved stock. This means each service must know about and handle failure events for the services it depends on. The compensation logic is distributed across services rather than centralized, making it harder to verify correctness.

Consider the order cancellation scenario. The Order Service publishes OrderCancelled. The Payment Service initiates a refund. The Inventory Service releases stock. The Shipping Service cancels the shipment if not yet dispatched. Each service independently handles cancellation. If any compensation fails, that service must retry or escalate. There is no central authority that can enforce the complete compensation sequence.

Monitoring is the most significant challenge. In a choreographed saga, there is no single place to observe the saga's progress. The overall workflow must be reconstructed by correlating events from all participating services. This requires a centralized event store or tracing infrastructure that captures all events with correlation IDs. Tools like event store databases or log aggregators can reconstruct saga state, but this is reactive — the actual workflow execution is still opaque at runtime.

Debugging failures requires correlating events across service boundaries. When a saga stalls — a step does not produce the expected event — the cause may be in any participating service. The monitoring team must trace through event logs to find the missing event. This is significantly harder than checking the orchestrator's state in an orchestrated saga.

When should choreography be preferred over orchestration? Choreography suits sagas with few participants (two or three services), simple compensation logic, and mature event infrastructure. It is ideal when services are owned by independent teams that want to minimize cross-team coordination. It also works well for long-running sagas where the orchestrator would become a bottleneck or single point of failure.

Domain characteristics matter. Sagas where each step is independently useful and compensatable lend themselves to choreography. Sagas that require conditional branching or complex coordination logic are better served by orchestration. A good rule of thumb: if you can describe the saga as a simple linear chain of events, choreography may suffice. If the saga requires any "or," "if," or "while" logic, orchestration is safer.

Production experience suggests starting with orchestration for critical workflows and moving to choreography only when the team demonstrates mature event handling patterns and monitoring infrastructure. The implicit nature of choreographed workflows demands operational maturity that many teams underestimate.