Introduction


Single LLM calls are rarely sufficient for complex tasks. Chaining — connecting multiple LLM calls in a pipeline — enables sophisticated workflows where each step builds on or refines the output of the previous one. This guide covers the essential chaining patterns used in production AI systems.


Why Chain?


A single LLM call has limitations:


  • **Attention dilution**: Long, complex prompts dilute attention across too many requirements
  • **Error compounding**: A single ambiguous instruction can produce incorrect output
  • **Token waste**: Including all context and instructions in one call is inefficient
  • **Debugging difficulty**: When output is wrong, isolating which instruction caused the problem is hard

  • Chaining addresses these by decomposing complex tasks into focused steps, each with a clear objective and validation criteria.


    Core Patterns


    Sequential Chain


    The simplest pattern: output of step N becomes input to step N+1.


    **Use case**: Multi-stage content processing


    
    Raw text → Extract key facts → Verify facts → Format output
    
    

    
    def sequential_chain(text):
    
        facts = extract_facts(text)
    
        verified = verify_facts(facts)
    
        formatted = format_output(verified)
    
        return formatted
    
    
    
    def extract_facts(text):
    
        return call_llm("Extract all factual claims from this text:", text)
    
    
    
    def verify_facts(claims):
    
        return call_llm("Verify each claim. Mark as VERIFIED, QUESTIONABLE, or FALSE:", claims)
    
    
    
    def format_output(verified):
    
        return call_llm("Format the verified claims as a clean bullet list:", verified)
    
    

    Map-Reduce Chain


    Process multiple items independently, then combine results.


    **Use case**: Summarizing many documents, analyzing multiple customer reviews


    
    def map_reduce(items, map_prompt, reduce_prompt):
    
        # Map: process each item independently
    
        intermediate = []
    
        for item in items:
    
            result = call_llm(map_prompt, item)
    
            intermediate.append(result)
    
    
    
        # Reduce: combine all intermediate results
    
        combined = "\n---\n".join(intermediate)
    
        final = call_llm(reduce_prompt, combined)
    
        return final
    
    
    
    # Example: summarize 50 customer reviews
    
    reviews = load_reviews()
    
    map_prompt = "Summarize this customer review in one sentence, focusing on sentiment and key points:"
    
    reduce_prompt = "Combine these review summaries into an overall analysis with common themes:"
    
    analysis = map_reduce(reviews, map_prompt, reduce_prompt)
    
    

    Parallel Processing


    Run multiple independent chains simultaneously, then merge results.


    **Use case**: Generating different sections of a document simultaneously


    
    import asyncio
    
    
    
    async def parallel_chain(topic):
    
        intro, specs, pricing, conclusion = await asyncio.gather(
    
            generate_intro(topic),
    
            generate_specs(topic),
    
            generate_pricing(topic),
    
            generate_conclusion(topic)
    
        )
    
        return assemble_document(intro, specs, pricing, conclusion)
    
    

    Parallel processing reduces wall-clock time significantly when chains are independent.


    Routing Chain


    Route input to different sub-chains based on classification.


    **Use case**: Customer support ticket routing


    
    def routing_chain(query):
    
        # First, classify the query type
    
        category = classify_query(query)
    
    
    
        # Route to specialized handler
    
        if category == "billing":
    
            return billing_chain(query)
    
        elif category == "technical":
    
            return technical_support_chain(query)
    
        elif category == "account":
    
            return account_management_chain(query)
    
        else:
    
            return general_inquiry_chain(query)
    
    
    
    def classify_query(query):
    
        categories = call_llm("""
    
        Classify this customer query into one of: billing, technical, account, general
    
        Respond with only the category name.
    
        """, query)
    
        return categories.strip().lower()
    
    

    Branching Chain


    Pursue multiple investigation paths from a single input, then synthesize.


    **Use case**: Research and analysis


    
    Query
    
     ├→ Factual research chain (what are the known facts?)
    
     ├→ Analysis chain (what does this mean?)
    
     ├→ Stakeholder chain (who is affected?)
    
     └→ Timeline chain (when did events occur?)
    
           └→ Synthesis: combine all branches into comprehensive report
    
    

    Validation Chain


    Add verification steps between generation steps to catch errors early.


    
    def generate_with_validation(topic):
    
        draft = generate_draft(topic)
    
    
    
        # Validation gate
    
        issues = validate_draft(draft)
    
        if issues:
    
            draft = revise_draft(draft, issues)
    
            # Re-validate
    
            issues = validate_draft(draft)
    
    
    
        if not issues:
    
            return draft
    
    
    
        # If still has issues after revision, flag for human review
    
        return {"draft": draft, "issues": issues, "needs_review": True}
    
    
    
    def validate_draft(draft):
    
        return call_llm("""
    
        Check this draft for:
    
        1. Factual accuracy
    
        2. Internal consistency
    
        3. Tone appropriateness
    
        4. Completeness
    
        List any issues found. If none, respond with "NO ISSUES".
    
        """, draft)
    
    

    Advanced Patterns


    Recursive Chain


    Apply the same chain repeatedly until a condition is met:


    
    def recursive_refine(text, max_iterations=5):
    
        for i in range(max_iterations):
    
            improved = call_llm("Improve this text: make it clearer and more concise:", text)
    
            quality_score = evaluate_quality(improved)
    
    
    
            if quality_score >= 0.9:
    
                return improved
    
            text = improved
    
        return text
    
    

    Feedback Loop Chain


    Use the model's own output to identify and correct its mistakes:


    
    def self_correcting_generation(task):
    
        output = generate(task)
    
        critique = call_llm("Critique this output. What's wrong or missing?", output)
    
        if "nothing wrong" in critique.lower():
    
            return output
    
        revision = call_llm(f"Revise this output based on this feedback: {critique}", output)
    
        return revision
    
    

    Production Considerations


    **Error handling**: Each chain step should have a timeout, retry logic, and fallback behavior.


    **Observability**: Log inputs, outputs, latency, and token usage at each chain step. This is essential for debugging and cost optimization.


    **Caching**: Cache results of deterministic chain steps (classification, extraction) to avoid redundant LLM calls.


    **Human escalation**: Design chains so that when confidence is low or validation fails, the task escalates to a human operator.


    Conclusion


    LLM chaining transforms unreliable single-shot generation into reliable multi-step pipelines. Start with sequential chains for simple transformations, add map-reduce for batch processing, and incorporate routing and branching for complex workflows. The key principle: each step should do one thing well, with clear inputs, outputs, and validation criteria.