Building AI-Powered CLI Tools: A Complete Guide for Developers
The terminal is having a renaissance. Developers spend hours in it every day — and LLMs have turned it from a read-only window into something that can understand, generate, and transform code and text. An AI-powered CLI tool isn't just an API wrapper with a flag parser. It's a new kind of interface: one where the computer can *interpret intent, reason about context, and take action*.
This guide covers how to build these tools end-to-end: architecture patterns, Python and Node.js implementations, streaming, interactive flows, file-aware agents, packaging, and two real-world examples you can adapt right now.
Why AI CLI Tools Are Different
A traditional CLI tool maps flags to function calls. An AI CLI tool does something fundamentally different:
The architecture looks like this:
User Input (args, stdin, interactive) → CLI Framework (Click/Commander)
→ Orchestrator (prompt construction, tool management)
→ LLM SDK (OpenAI/Anthropic/Claude)
→ Streaming stdout / file writes / git commits / API calls
The CLI framework handles input parsing and help text. The orchestrator constructs prompts, manages conversation history, and decides when to call tools. The LLM SDK is a thin wrapper — the real work is in prompt engineering and tool orchestration.
Architecture of AI CLI Tools
Every AI CLI tool shares these layers:
**Input Layer.** Accepts flags, arguments, stdin, and interactive input. This is where you decide between `tool ask "question"`, `cat file | tool`, and `tool --interactive`.
**Context Layer.** Gathers information the model needs: file contents, git diff output, directory listings, environment variables, previous conversation turns.
**Orchestration Layer.** Manages the conversation loop. For simple tools this is one request/response. For agents, it's a loop: model responds, you execute tool calls, you feed results back, model responds again.
**Output Layer.** Streams tokens to stdout, formats structured output (JSON, markdown), and handles errors gracefully.
The key design decision is **stateless vs. stateful**. Stateless tools (one question, one answer) are simpler. Stateful tools (multi-turn conversations, file edits, undo) require persistence — typically a session file or a temp directory.
Python: Click + LLM SDK
Python is the most popular language for CLI tools, and [Click](https://click.palletsprojects.com/) is the standard framework. Pair it with the `openai` or `anthropic` SDK.
import click
from openai import OpenAI
client = OpenAI()
@click.command()
@click.argument("prompt", required=False)
@click.option("--model", default="gpt-4o", help="Model to use")
@click.option("--system", default="You are a helpful assistant.")
def ask(prompt, model, system):
"""Ask an LLM a question from the command line."""
if not prompt and not click.get_text_stream("stdin").isatty():
prompt = click.get_text_stream("stdin").read().strip()
if not prompt:
click.echo("Usage: ask PROMPT or pipe input")
return
stream = client.chat.completions.create(
model=model,
messages=[
{"role": "system", "content": system},
{"role": "user", "content": prompt},
],
stream=True,
)
for chunk in stream:
content = chunk.choices[0].delta.content or ""
click.echo(content, nl=False)
click.echo()
if __name__ == "__main__":
ask()
This tool accepts input as an argument or via stdin pipe, streams the response character by character, and uses Click's built-in help formatting. The streaming loop is the critical difference from a non-AI CLI — users expect to see output appear incrementally, not wait for a full response.
For a richer experience with [Typer](https://typer.tiangolo.com/) (Click with type hints):
import typer
from rich.console import Console
from rich.live import Live
from rich.markdown import Markdown
from anthropic import Anthropic
app = typer.Typer()
console = Console()
client = Anthropic()
@app.command()
def chat(
prompt: str = typer.Argument(None, help="Your question"),
model: str = "claude-sonnet-4-20250514",
):
"""Chat with Claude from the terminal."""
if not prompt:
import sys
prompt = sys.stdin.read().strip()
with client.messages.stream(
model=model,
max_tokens=4096,
messages=[{"role": "user", "content": prompt}],
) as stream:
with Live(refresh_per_second=15) as live:
collected = ""
for text in stream.text_stream:
collected += text
live.update(Markdown(collected))
if __name__ == "__main__":
app()
Rich's `Live` display with Markdown rendering makes the terminal feel like a chat UI. The `stream.text_stream` pattern gives you tokens as they arrive.
Node.js: Commander + LangChain
In the Node.js ecosystem, [Commander](https://github.com/tj/commander.js) is the standard CLI framework. [LangChain](https://js.langchain.com/) adds LLM orchestration, or you can use the SDK directly.
#!/usr/bin/env node
import { Command } from 'commander';
import Anthropic from '@anthropic-ai/sdk';
const client = new Anthropic();
const program = new Command()
.name('ask')
.description('Ask Claude a question')
.argument('[prompt]', 'Your question')
.option('-m, --model <model>', 'Model to use', 'claude-sonnet-4-20250514')
.option('-s, --system <prompt>', 'System prompt')
.action(async (prompt, options) => {
if (!prompt) {
const stdin = process.stdin.read();
prompt = stdin?.toString().trim();
}
if (!prompt) {
console.error('Usage: ask PROMPT or pipe input');
process.exit(1);
}
const stream = await client.messages.stream({
model: options.model,
max_tokens: 4096,
system: options.system,
messages: [{ role: 'user', content: prompt }],
}).on('text', (text) => process.stdout.write(text));
console.log();
});
program.parse();
The `@anthropic-ai/sdk` Node.js streaming API emits `text` events. Write each chunk to `process.stdout` for that real-time feel. For more complex tools, LangChain's `RunnableSequence` lets you chain prompts, tools, and output parsers:
import { Command } from 'commander';
import { ChatOpenAI } from '@langchain/openai';
import { StringOutputParser } from '@langchain/core/output_parsers';
import { ChatPromptTemplate } from '@langchain/core/prompts';
const model = new ChatOpenAI({ model: 'gpt-4o', streaming: true });
const parser = new StringOutputParser();
const prompt = ChatPromptTemplate.fromMessages([
['system', 'You are a technical writer.'],
['user', 'Explain {topic} in one paragraph.'],
]);
const chain = prompt.pipe(model).pipe(parser);
const program = new Command()
.argument('<topic>')
.action(async (topic) => {
const stream = await chain.stream({ topic });
for await (const chunk of stream) {
process.stdout.write(chunk);
}
console.log();
});
program.parse();
LangChain shines when you need tool-calling agents (reading files, running commands, browsing the web). The chain abstraction keeps the orchestration readable.
Pattern: Streaming Responses to stdout
Streaming is table stakes. Users won't wait 10 seconds for a full response. Here's the pattern that works across languages:
2. **Write each token** to stdout as it arrives — no buffering.
3. **Handle backpressure.** If the terminal is slow, don't drop tokens; let the OS buffer.
4. **Support Ctrl+C.** Trap SIGINT to print a clean exit, not a traceback.
In Python with Click:
import signal
import sys
def handle_interrupt(sig, frame):
click.echo("\n[Interrupted]", err=True)
sys.exit(130)
signal.signal(signal.SIGINT, handle_interrupt)
In Node.js with Commander:
process.on('SIGINT', () => {
console.error('\n[Interrupted]');
process.exit(130);
});
Rich formatting (Markdown, syntax highlighting) makes streaming output readable. But be careful with Rich's `Live` — it re-renders the entire output on each frame, which can be slow for long streams. For long outputs, raw `click.echo` or `process.stdout.write` is more performant.
Pattern: Interactive Multi-Step Tools
Not every tool is a single query. Interactive tools maintain state across turns. The pattern is a **REPL loop** with AI-generated responses.
import click
from openai import OpenAI
client = OpenAI()
@click.command()
@click.option("--model", default="gpt-4o")
def chat(model):
"""Interactive chat session."""
click.echo("Chat session started. Type /exit to quit.", err=True)
messages = [{"role": "system", "content": "You are a helpful assistant."}]
while True:
user_input = click.prompt("You", prompt_suffix="> ")
if user_input.strip() == "/exit":
break
messages.append({"role": "user", "content": user_input})
stream = client.chat.completions.create(
model=model, messages=messages, stream=True
)
click.echo("AI: ", nl=False)
collected = ""
for chunk in stream:
content = chunk.choices[0].delta.content or ""
collected += content
click.echo(content, nl=False)
click.echo()
messages.append({"role": "assistant", "content": collected})
For **tool-using agents**, the pattern extends to a loop: the model requests a tool call, your code executes it and feeds the result back, the model continues:
while True:
response = client.responses.create(
model="gpt-4o",
input=messages,
tools=[{
"type": "function",
"function": {
"name": "read_file",
"description": "Read a file from disk",
"parameters": {
"type": "object",
"properties": {
"path": {"type": "string"}
}
}
}
}]
)
if response.output[0].type == "function_call":
fn = response.output[0]
result = execute_tool(fn.name, json.loads(fn.arguments))
messages.append({"role": "tool", "content": str(result), "tool_call_id": fn.id})
else:
click.echo(response.output_text)
break
This is the same pattern powering Claude Code and similar AI coding agents. The complexity is in which tools you expose (file read/write, git, shell, search) and how much context you pack into the system prompt.
Pattern: File-Aware Tools
File-aware tools read the user's working directory before generating responses. This is the most useful pattern for developer tools.
import os
import click
from anthropic import Anthropic
client = Anthropic()
@click.command()
@click.argument("files", nargs=-1, type=click.Path(exists=True))
@click.option("--recursive/--no-recursive", default=True)
def analyze(files, recursive):
"""Analyze files with AI assistance."""
contents = []
for f in files:
if os.path.isfile(f):
with open(f) as fh:
contents.append(f"### {f}\n\n```\n{fh.read()}\n```")
elif os.path.isdir(f) and recursive:
for root, _, filenames in os.walk(f):
for fn in filenames:
path = os.path.join(root, fn)
try:
with open(path) as fh:
contents.append(f"### {path}\n\n```\n{fh.read()}\n```")
except Exception:
pass
context = "\n\n".join(contents[:20]) # limit context size
prompt = f"The user wants to understand these files:\n\n{context}"
with client.messages.stream(
model="claude-sonnet-4-20250514",
max_tokens=4096,
messages=[{"role": "user", "content": prompt}],
) as stream:
for text in stream.text_stream:
click.echo(text, nl=False)
click.echo()
The key challenge is **context window management**. You can't dump every file in a monorepo into the prompt. Strategies:
CLI Framework Comparison
| Feature | Click | Typer | Commander | Clack |
|---|---|---|---|---|
| **Language** | Python | Python | Node.js | Node.js |
| **Type hints** | Decorators | Type-annotated | Methods | Methods |
| **Help text** | Auto (good) | Auto (excellent) | Auto | Auto |
| **Subcommands**| Yes (groups) | Yes | Yes | No (single command) |
| **Interactive prompts** | Yes (click.prompt) | Yes | Manual | Rich (native) |
| **Autocomplete**| Shell completion | Shell completion | Manual | Built-in |
| **Spinner/progress** | No | Rich integration | No | Built-in spinners |
| **Best for** | Any Python CLI | Python CLI + type safety | Any Node.js CLI | Interactive Node.js CLIs |
**Click** is the battle-tested Python standard. It handles argument parsing, help text formatting, subcommands, and shell completion. The decorator API is clean and composable.
**Typer** wraps Click with Python type hints. Less boilerplate, better autocomplete in IDEs, and built-in Rich integration for colored output and spinners.
**Commander** is Click's Node.js equivalent. It's minimal, widely used, and easy to extend. You handle streaming and spinners yourself.
**Clack** is newer and focused on *interactive* CLIs — prompts with autocomplete, multiselect, spinners, and cancel handling. It's not great for traditional flag-based tools, but excellent for `npm init`-style interactive setups.
For AI CLI tools, Typer + Rich (Python) or Commander + `@clack/prompt` (Node.js) are the most productive combinations.
Real Example: AI Code Review CLI
This tool runs `git diff` against the staging area, sends the diff to an LLM for review, and outputs structured feedback.
#!/usr/bin/env python3
import subprocess
import click
from anthropic import Anthropic
client = Anthropic()
@click.command()
@click.option("--diff", is_flag=True, help="Review unstaged changes too")
@click.option("--model", default="claude-sonnet-4-20250514")
@click.option("--output", type=click.Choice(["text", "markdown", "json"]), default="markdown")
def review(diff, model, output):
"""Review staged git changes with AI."""
# Get git diff
cmd = ["git", "diff", "--cached"]
if diff:
cmd = ["git", "diff"]
result = subprocess.run(cmd, capture_output=True, text=True)
if not result.stdout.strip():
click.echo("No changes to review.", err=True)
raise click.Abort()
# Count lines and warn
lines = result.stdout.count("\n")
if lines > 2000:
click.echo(f"Diff is {lines} lines, will review first 2000.", err=True)
system = """You are a senior code reviewer. Review the git diff below.
Focus on: logic errors, security issues, performance problems, style violations.
For each issue, include: file, line, severity (critical/warning/nit), and suggestion.
Output in the format requested."""
prompt = f"Review this git diff:\n\n```diff\n{result.stdout[:50000]}\n```"
if output == "json":
response = client.messages.create(
model=model,
max_tokens=4096,
system=system,
messages=[{"role": "user", "content": prompt + "\n\nRespond in JSON format."}],
)
print(response.content[0].text)
else:
with client.messages.stream(
model=model, max_tokens=4096, system=system,
messages=[{"role": "user", "content": prompt}],
) as stream:
for text in stream.text_stream:
click.echo(text, nl=False)
click.echo()
if __name__ == "__main__":
review()
The code review tool is a good example of the **file-aware + git-aware** pattern. It captures context (the diff) without needing file I/O itself. The key design choices:
Real Example: AI Git Commit Message Generator
This is the most common AI CLI tool in the wild. It reads staged changes and generates a conventional commit message.
#!/usr/bin/env python3
import subprocess
import json
import click
from anthropic import Anthropic
client = Anthropic()
@click.command()
@click.option("--model", default="claude-sonnet-4-20250514")
@click.option("--type", "commit_type", help="Conventional commit type (feat, fix, etc.)")
@click.option("--scope", help="Commit scope")
def commit(model, commit_type, scope):
"""Generate a commit message from staged changes."""
result = subprocess.run(
["git", "diff", "--cached"],
capture_output=True, text=True
)
if not result.stdout.strip():
click.echo("No staged changes. Stage files with `git add` first.", err=True)
raise click.Abort()
prompt = f"""Generate a conventional commit message for this diff.
Format: {commit_type or '<type>'}({scope or '<scope>'}): <description>
<body>
Rules:
- First line max 72 characters
- Use imperative mood
- Body wraps at 72 characters
- Focus on WHAT and WHY, not HOW
Diff:
{result.stdout[:20000]}
with client.messages.stream(
model=model,
max_tokens=500,
system="You generate concise, structured git commit messages.",
messages=[{"role": "user", "content": prompt}],
) as stream:
for text in stream.text_stream:
click.echo(text, nl=False)
click.echo("\n")
Extended versions of this tool:
Error Handling for LLM Calls in CLI
LLM calls fail in ways normal API calls don't. Your CLI must handle:
**Rate limits.** The API returns 429. Handle with exponential backoff plus jitter:
import time
import random
def call_with_retry(client, max_retries=3, **kwargs):
for attempt in range(max_retries):
try:
return client.messages.create(**kwargs)
except Exception as e:
if "429" in str(e) and attempt < max_retries - 1:
sleep = (2 ** attempt) + random.random()
click.echo(f"Rate limited, retrying in {sleep:.0f}s...", err=True)
time.sleep(sleep)
else:
raise
**Context overflow.** The prompt exceeds the model's context window. Pre-count tokens and truncate:
import tiktoken
def truncate(text, model="claude-sonnet-4-20250514", max_tokens=80000):
enc = tiktoken.encoding_for_model("gpt-4") # approximates well enough
tokens = enc.encode(text)
if len(tokens) > max_tokens:
click.echo(f"Truncating {len(tokens)} tokens to {max_tokens}", err=True)
return enc.decode(tokens[:max_tokens])
return text
**Network errors.** `ConnectionError`, `Timeout` — catch these specifically and give actionable messages:
try:
response = client.messages.create(...)
except ConnectionError:
click.echo("Error: Cannot reach the API. Check your internet connection.", err=True)
raise click.Abort()
except Exception as e:
click.echo(f"API error: {e}", err=True)
raise click.Abort()
**Structured error output.** For JSON mode, errors should also be JSON:
@click.command()
@click.option("--json-output", is_flag=True)
def tool(json_output):
try:
# ...
except Exception as e:
if json_output:
click.echo(json.dumps({"error": str(e), "success": False}))
else:
click.echo(f"Error: {e}", err=True)
raise click.Abort()
Packaging and Distribution
PyPI (Python)
# pyproject.toml
[build-system]
requires = ["setuptools", "wheel"]
build-backend = "setuptools.build_meta"
[project]
name = "my-ai-tool"
version = "0.1.0"
dependencies = [
"click>=8.0",
"anthropic>=0.30.0",
"tiktoken>=0.5.0",
]
[project.scripts]
my-ai-tool = "my_ai_tool.cli:main"
Install with `pip install my-ai-tool` or publish with `flit publish`.
npm (Node.js)
{
"name": "my-ai-tool",
"version": "0.1.0",
"bin": {
"my-ai-tool": "./bin/cli.js"
},
"dependencies": {
"commander": "^12.0.0",
"@anthropic-ai/sdk": "^0.30.0"
}
}
Publish with `npm publish`.
Homebrew (macOS/Linux)
For a Go or compiled binary, a Homebrew tap is the standard distribution channel:
class MyAiTool < Formula
desc "AI-powered CLI tool"
homepage "https://github.com/you/my-ai-tool"
url "https://github.com/you/my-ai-tool/archive/v0.1.0.tar.gz"
sha256 "..."
depends_on "python@3.12"
def install
bin.install "my-ai-tool"
end
end
For interpreted languages (Python, Node.js), PyPI and npm are better distribution channels. Homebrew makes sense for compiled binaries (Rust, Go, Zig) that embed the LLM client.
Environment and Configuration
AI CLI tools need API keys. Best practices:
import os
from dotenv import load_dotenv
load_dotenv() # load .env file
api_key = os.environ.get("ANTHROPIC_API_KEY") or os.environ.get("OPENAI_API_KEY")
if not api_key:
click.echo("Error: Set ANTHROPIC_API_KEY or OPENAI_API_KEY", err=True)
raise click.Abort()
Putting It All Together: Architecture Checklist
When designing a new AI CLI tool, run through these questions:
2. **Context:** What files, commands, or environment state does the model need to see?
3. **Stateless or stateful?** A single Q&A, or a session with history and state?
4. **Streaming:** Are you writing tokens to stdout as they arrive?
5. **Tools:** Can the model read files, run commands, make API calls, or edit files?
6. **Error recovery:** What happens on 429, context overflow, or network failure?
7. **Output format:** Plain text, markdown, JSON, or structured data for piping?
8. **Distribution:** PyPI, npm, Homebrew, or a single binary?
The answers define your architecture. A simple "explain this error" tool needs only streaming Q&A. A "refactor this codebase" tool needs file I/O, git integration, multi-step planning, and undo support.
Beyond Simple Wrappers
The tools described here are the foundation. The next generation of AI CLI tools will:
The pattern is always the same: CLI framework handles input, LLM SDK handles generation, your orchestration code connects them. Get the streaming right, handle errors gracefully, and you have a tool that feels like magic in the terminal.
See also: [Click vs Typer vs argparse](), [Best Terminal Emulators 2026](), [Best Local Dev Tools]().