Agent Planning: ReAct, Plan-and-Execute, Tree of Thoughts, Reflection


Introduction





Planning transforms LLMs from reactive responders into proactive agents that can decompose complex goals, explore solution paths, and recover from failures. This article covers four major planning frameworks: ReAct for tight coupling of reasoning and action, Plan-and-Execute for hierarchical decomposition, Tree of Thoughts for exploring multiple reasoning paths, and self-reflection for learning from mistakes.





ReAct (Reasoning + Acting)





ReAct interleaves reasoning traces with tool calls, allowing the agent to think about what to do next based on current observations:






class ReActAgent:


def __init__(self, tools: list[dict], llm_fn):


self.tools = tools


self.llm = llm_fn




def run(self, task: str, max_steps: int = 10) -> str:


messages = [{"role": "system", "content": self._system_prompt()},


{"role": "user", "content": task}]




for step in range(max_steps):


response = self.llm(messages, tools=self.tools)




if "Final Answer:" in response:


return response.split("Final Answer:")[-1].strip()




# Parse the Thought/Action/Observation cycle


thought = self._extract(response, "Thought:")


action = self._extract_action(response)


if action:


observation = self._execute_tool(action)


messages.append({"role": "assistant", "content": response})


messages.append({"role": "system", "content": f"Observation: {observation}"})




return "Failed to complete task within step limit."




def _system_prompt(self) -> str:


return """You are a ReAct agent. For each step:


Thought: Reason about what to do next


Action: Choose a tool and specify arguments


(Wait for observation)


...repeat until done...


Final Answer: Provide the complete answer"""




def _extract_action(self, text: str) -> dict | None:


"""Parse Action: ToolName(arg1=val1, arg2=val2) from text."""


import re


match = re.search(r"Action:\s*(\w+)\((.+)\)", text)


if not match:


return None


tool_name = match.group(1)


args_text = match.group(2)


args = dict(re.findall(r"(\w+)=([^,)]+)", args_text))


return {"name": tool_name, "args": args}




def _execute_tool(self, action: dict) -> str:


for tool in self.tools:


if tool["name"] == action["name"]:


return tool["function"](**action["args"])


return f"Error: Unknown tool '{action['name']}'"







Plan-and-Execute





This framework separates planning from execution. A planner creates a step-by-step plan, then an executor follows it:






class PlanAndExecute:


def __init__(self, llm_fn, tools):


self.planner_llm = llm_fn


self.executor_llm = llm_fn


self.tools = tools




async def run(self, task: str) -> str:


# Phase 1: Create a plan


plan = await self._create_plan(task)


results = []




# Phase 2: Execute each step


for i, step in enumerate(plan):


print(f"Executing step {i+1}: {step['description']}")




# Check dependencies


context = self._gather_context(step, results)


result = await self._execute_step(step, context)




# Verify step completion


verified = await self._verify_step(step, result)


if not verified:


# Re-plan from this point


plan = await self._replan(task, i, plan, result)


continue




results.append({"step": step, "result": result})




# Phase 3: Synthesize final answer


return await self._synthesize(task, plan, results)




async def _create_plan(self, task: str) -> list[dict]:


response = self.planner_llm(f"""


Create a step-by-step plan for: {task}


For each step, specify:


- description: what to do


- tool: which tool to use (or 'none')


- dependencies: which step numbers this step depends on




Output as a JSON array.


""")


return json.loads(response)




async def _replan(self, original_task: str, failed_step: int, old_plan: list, error: str) -> list:


response = self.planner_llm(f"""


Step {failed_step} failed: {error}


Original plan: {old_plan}


Original task: {original_task}




Create a revised plan starting from the failure point.


""")


return json.loads(response)







Tree of Thoughts (ToT)





ToT explores multiple reasoning paths simultaneously, evaluating each branch:






class TreeOfThoughts:


def __init__(self, llm_fn, branches: int = 3, depth: int = 3):


self.llm = llm_fn


self.branches = branches


self.depth = depth




def solve(self, problem: str) -> str:


# Initialize the tree with root thoughts


candidates = self._generate_thoughts(problem, [])


best_path = None


best_score = float("-inf")




for level in range(self.depth):


# Evaluate each candidate


scored = []


for thought in candidates:


state = thought # The thought is the reasoning so far


score = self._evaluate_thought(problem, state)


scored.append((state, score))




if score > best_score and level == self.depth - 1:


best_score = score


best_path = state




# Select top-k candidates for expansion


scored.sort(key=lambda x: x[1], reverse=True)


top_candidates = scored[:self.branches]




if level < self.depth - 1:


# Generate next thoughts from top candidates


candidates = []


for state, _ in top_candidates:


next_thoughts = self._generate_thoughts(problem, state)


candidates.extend(next_thoughts)




return best_path or "No solution found."




def _generate_thoughts(self, problem: str, current_state: list[str]) -> list[list[str]]:


context = "\n".join(current_state) if current_state else "No reasoning yet."


response = self.llm(f"""


Problem: {problem}


Current reasoning: {context}




Generate {self.branches} different next steps in reasoning.


Each should be a plausible continuation. Be diverse.


""")


# Parse response into multiple thought continuations


return [current_state + [t] for t in parse_thoughts(response)]




def _evaluate_thought(self, problem: str, state: list[str]) -> float:


context = "\n".join(state)


score = self.llm(f"""


Problem: {problem}


Reasoning so far: {context}




Rate the promise of this reasoning path on a scale of 0 to 1.


Output ONLY a number.


""")


return float(score.strip())







Reflection





Reflection enables agents to learn from their mistakes during execution:






class ReflectiveAgent:


def __init__(self, llm_fn, tools):


self.llm = llm_fn


self.tools = tools


self.reflection_log = []




async def run(self, task: str) -> str:


max_attempts = 3


for attempt in range(max_attempts):


result = await self._attempt(task)




if self._is_successful(result):


return result["output"]




# Reflect on the failure


reflection = self._reflect(task, result)


self.reflection_log.append(reflection)




# Update strategy based on reflection


task = self._revise_task(task, reflection)




return "Failed after multiple attempts."




def _reflect(self, task: str, result: dict) -> str:


return self.llm(f"""


Task: {task}


What went wrong: {result.get('error', 'Unknown error')}


Actions taken: {result.get('actions', [])}


Partial output: {result.get('partial_output', '')}




Reflect on:


1. What was the root cause of the failure?


2. What should be done differently next time?


3. Is there missing information needed?




Reflection:


""")




def _revise_task(self, task: str, reflection: str) -> str:


return self.llm(f"""


Original task: {task}


After reflecting: {reflection}




Revise the task description to incorporate lessons learned


and avoid repeating the same mistake.


""")







Choosing a Framework





| Framework | Best For | When to Use |


|-----------|----------|-------------|


| ReAct | Interactive tasks with tools | Standard agent tasks requiring reasoning |


| Plan-and-Execute | Complex multi-step tasks | When the plan is knowable upfront |


| Tree of Thoughts | Creative/exploratory tasks | When multiple approaches are valid |


| Reflection | Error-prone environments | When learning from mistakes is critical |





Conclusion





Agent planning frameworks provide structure for LLM reasoning. ReAct couples reasoning with tool use for interactive tasks. Plan-and-Execute separates planning from execution for complex workflows. Tree of Thoughts explores multiple reasoning paths for problems with branching solutions. Reflection enables continuous improvement by learning from failures. In practice, combine these patterns: use ReAct for execution, Plan-and-Execute for structure, ToT for exploration, and Reflection for improvement.