Introduction
Single LLM calls are rarely sufficient for complex tasks. Chaining — connecting multiple LLM calls in a pipeline — enables sophisticated workflows where each step builds on or refines the output of the previous one. This guide covers the essential chaining patterns used in production AI systems.
Why Chain?
A single LLM call has limitations:
Chaining addresses these by decomposing complex tasks into focused steps, each with a clear objective and validation criteria.
Core Patterns
Sequential Chain
The simplest pattern: output of step N becomes input to step N+1.
**Use case**: Multi-stage content processing
Raw text → Extract key facts → Verify facts → Format output
def sequential_chain(text):
facts = extract_facts(text)
verified = verify_facts(facts)
formatted = format_output(verified)
return formatted
def extract_facts(text):
return call_llm("Extract all factual claims from this text:", text)
def verify_facts(claims):
return call_llm("Verify each claim. Mark as VERIFIED, QUESTIONABLE, or FALSE:", claims)
def format_output(verified):
return call_llm("Format the verified claims as a clean bullet list:", verified)
Map-Reduce Chain
Process multiple items independently, then combine results.
**Use case**: Summarizing many documents, analyzing multiple customer reviews
def map_reduce(items, map_prompt, reduce_prompt):
# Map: process each item independently
intermediate = []
for item in items:
result = call_llm(map_prompt, item)
intermediate.append(result)
# Reduce: combine all intermediate results
combined = "\n---\n".join(intermediate)
final = call_llm(reduce_prompt, combined)
return final
# Example: summarize 50 customer reviews
reviews = load_reviews()
map_prompt = "Summarize this customer review in one sentence, focusing on sentiment and key points:"
reduce_prompt = "Combine these review summaries into an overall analysis with common themes:"
analysis = map_reduce(reviews, map_prompt, reduce_prompt)
Parallel Processing
Run multiple independent chains simultaneously, then merge results.
**Use case**: Generating different sections of a document simultaneously
import asyncio
async def parallel_chain(topic):
intro, specs, pricing, conclusion = await asyncio.gather(
generate_intro(topic),
generate_specs(topic),
generate_pricing(topic),
generate_conclusion(topic)
)
return assemble_document(intro, specs, pricing, conclusion)
Parallel processing reduces wall-clock time significantly when chains are independent.
Routing Chain
Route input to different sub-chains based on classification.
**Use case**: Customer support ticket routing
def routing_chain(query):
# First, classify the query type
category = classify_query(query)
# Route to specialized handler
if category == "billing":
return billing_chain(query)
elif category == "technical":
return technical_support_chain(query)
elif category == "account":
return account_management_chain(query)
else:
return general_inquiry_chain(query)
def classify_query(query):
categories = call_llm("""
Classify this customer query into one of: billing, technical, account, general
Respond with only the category name.
""", query)
return categories.strip().lower()
Branching Chain
Pursue multiple investigation paths from a single input, then synthesize.
**Use case**: Research and analysis
Query
├→ Factual research chain (what are the known facts?)
├→ Analysis chain (what does this mean?)
├→ Stakeholder chain (who is affected?)
└→ Timeline chain (when did events occur?)
└→ Synthesis: combine all branches into comprehensive report
Validation Chain
Add verification steps between generation steps to catch errors early.
def generate_with_validation(topic):
draft = generate_draft(topic)
# Validation gate
issues = validate_draft(draft)
if issues:
draft = revise_draft(draft, issues)
# Re-validate
issues = validate_draft(draft)
if not issues:
return draft
# If still has issues after revision, flag for human review
return {"draft": draft, "issues": issues, "needs_review": True}
def validate_draft(draft):
return call_llm("""
Check this draft for:
1. Factual accuracy
2. Internal consistency
3. Tone appropriateness
4. Completeness
List any issues found. If none, respond with "NO ISSUES".
""", draft)
Advanced Patterns
Recursive Chain
Apply the same chain repeatedly until a condition is met:
def recursive_refine(text, max_iterations=5):
for i in range(max_iterations):
improved = call_llm("Improve this text: make it clearer and more concise:", text)
quality_score = evaluate_quality(improved)
if quality_score >= 0.9:
return improved
text = improved
return text
Feedback Loop Chain
Use the model's own output to identify and correct its mistakes:
def self_correcting_generation(task):
output = generate(task)
critique = call_llm("Critique this output. What's wrong or missing?", output)
if "nothing wrong" in critique.lower():
return output
revision = call_llm(f"Revise this output based on this feedback: {critique}", output)
return revision
Production Considerations
**Error handling**: Each chain step should have a timeout, retry logic, and fallback behavior.
**Observability**: Log inputs, outputs, latency, and token usage at each chain step. This is essential for debugging and cost optimization.
**Caching**: Cache results of deterministic chain steps (classification, extraction) to avoid redundant LLM calls.
**Human escalation**: Design chains so that when confidence is low or validation fails, the task escalates to a human operator.
Conclusion
LLM chaining transforms unreliable single-shot generation into reliable multi-step pipelines. Start with sequential chains for simple transformations, add map-reduce for batch processing, and incorporate routing and branching for complex workflows. The key principle: each step should do one thing well, with clear inputs, outputs, and validation criteria.