RAG Agent Patterns: Self-Query, Corrective, Adaptive Retrieval

Introduction

Basic RAG retrieves documents once and generates an answer. RAG agents take this further: they decide when to retrieve, formulate their own queries, verify retrieved information, and adapt their strategy based on the question complexity. This article covers three agentic RAG patterns that dramatically improve retrieval quality.

Self-Query RAG

Instead of using the raw user question as the search query, the agent generates an optimized query:

def self_query_rag(question: str) -> str:

# Step 1: Generate search query

search_query = call_llm(f"""

Generate an optimal search query for a vector database.

Extract key terms, rephrase questions as search statements.

Output ONLY the search query, nothing else.

User question: {question}

""")

# Step 2: Retrieve using optimized query

chunks = vector_search(search_query, k=5)

# Step 3: Generate answer from retrieved chunks

context = "\n\n".join(chunks)

answer = call_llm(f"""

Answer the question based on the context below.

If the context does not contain enough information, say so.

Context: {context}

Question: {question}

""")

return answer

The self-query pattern resolves the fundamental mismatch between natural language questions and keyword-optimized search indices. A question like "How do I handle rate limiting?" becomes the search query "rate limiting strategies implementation patterns error handling."

Corrective RAG (CRAG)

Corrective RAG adds a verification step between retrieval and generation. If retrieved documents are irrelevant, the agent takes corrective action:

def corrective_rag(question: str, max_attempts: int = 3) -> str:

for attempt in range(max_attempts):

# Retrieve

chunks = vector_search(question, k=5)

# Score relevance

relevance_scores = []

for chunk in chunks:

score = call_llm(f"""

On a scale of 0-10, how relevant is this document to:

'{question}'

Respond with only a number.

""", chunk)

relevance_scores.append(float(score.strip()))

avg_relevance = sum(relevance_scores) / len(relevance_scores)

if avg_relevance >= 7:

# High confidence: generate answer

context = "\n\n".join(chunks[:3])

return generate_answer(question, context)

elif avg_relevance >= 4:

# Medium confidence: try query decomposition

sub_questions = decompose_question(question)

sub_answers = [corrective_rag(sq) for sq in sub_questions]

return synthesize_answers(question, sub_answers)

else:

# Low confidence: reformulate query

question = reformulate_query(question, chunks)

return "Unable to find sufficient information to answer this question."

CRAG prevents the "hallucinate confidently from irrelevant context" failure mode common in naive RAG. Each attempt either improves the query or escalates to a more sophisticated strategy.

Adaptive Retrieval

Adaptive retrieval dynamically selects the retrieval strategy based on question characteristics:

class AdaptiveRetriever:

def __init__(self):

self.strategies = {

"factoid": self.factoid_retrieval,

"comparison": self.comparison_retrieval,

"procedural": self.procedural_retrieval,

"analytical": self.analytical_retrieval,

}

def retrieve(self, question: str) -> list[str]:

# Classify the question type

q_type = call_llm(f"""

Classify this question as one of: factoid, comparison, procedural, analytical

Respond with only the type name.

Question: {question}

""")

strategy = self.strategies.get(q_type.strip(), self.factoid_retrieval)

return strategy(question)

def factoid_retrieval(self, question: str) -> list[str]:

# Simple direct retrieval

return vector_search(question, k=3)

def comparison_retrieval(self, question: str) -> list[str]:

# Retrieve documents for each side of the comparison

entities = extract_comparison_entities(question)

docs = []

for entity in entities:

docs.extend(vector_search(entity, k=3))

return docs[:6]

def procedural_retrieval(self, question: str) -> list[str]:

# Step-by-step retrieval

steps = decompose_steps(question)

docs = []

for step in steps:

docs.extend(vector_search(step, k=2))

return docs[:8]

def analytical_retrieval(self, question: str) -> list[str]:

# Retrieve broadly then narrow

broad = vector_search(question, k=20)

reranked = rerank(question, broad)

return reranked[:5]

Multi-Hop Retrieval

Some questions require retrieving information about entities discovered during retrieval:

def multi_hop_rag(question: str, max_hops=3):

context_chunks = []

current_query = question

for hop in range(max_hops):

chunks = vector_search(current_query, k=3)

context_chunks.extend(chunks)

# Check if we need another hop

needs_more = call_llm(f"""

Can you answer '{question}' with the information retrieved so far?

Answer YES or NO. If NO, specify what additional information is needed.

Context so far: {' '.join(context_chunks[:5])}

""")

if needs_more.startswith("YES"):

break

# Extract the next search target

current_query = call_llm(f"""

What additional information do we need to answer '{question}'?

Output a single search query.

Context: {' '.join(context_chunks[-3:])}

""")

return generate_answer(question, context_chunks)

Conclusion

RAG agents extend basic retrieval with reasoning. Self-query RAG optimizes the search query for better retrieval. Corrective RAG verifies retrieved content and adapts when relevance is low. Adaptive retrieval selects the strategy that fits the question type. Multi-hop retrieval follows information chains across documents. These patterns transform RAG from a single-pass lookup into an intelligent research process.

RAG Agent Patterns: Self-Query, Corrective, Adaptive Retrieval

RAG Agent Patterns: Self-Query, Corrective, Adaptive Retrieval

Introduction

Self-Query RAG

Corrective RAG (CRAG)

Adaptive Retrieval

Multi-Hop Retrieval

Conclusion

Related Articles