Adding AI to your app means calling an API. OpenAI, Anthropic, and Google AI each have different SDKs, pricing models, and capabilities. Here's the practical integration guide covering the patterns you'll actually use: streaming, function calling, embeddings, and cost optimization.

The Big Three AI APIs

OpenAIAnthropicGoogle AI
ModelsGPT-4o, GPT-4.1, o4-miniClaude Opus 4, Sonnet 4, Haiku 4Gemini 2.5 Pro, Flash
Max context128K tokens200K tokens1M tokens
SDKopenai (Node/Python)@anthropic-ai/sdk@google/generative-ai
Pricing modelPer 1K tokens (in+out)Per 1M tokens (in+out)Per 1M chars (in+out)
Image inputYes (GPT-4o)YesYes
Image outputYes (DALL-E)NoYes (Imagen)
StreamingYes (SSE)Yes (SSE + streaming text)Yes

1. Streaming Responses

Streaming shows tokens as they're generated โ€” critical for good UX. All three APIs support it:

// Anthropic streaming example
import Anthropic from "@anthropic-ai/sdk";

const client = new Anthropic();

const stream = client.messages.stream({
  model: "claude-sonnet-4-20250514",
  max_tokens: 4096,
  messages: [{ role: "user", content: "Write a function to..." }],
});

stream.on("text", (text) => {
  process.stdout.write(text);  // Show tokens as they arrive
});

const finalMessage = await stream.finalMessage();

2. Function Calling (Tool Use)

Function calling lets the AI call your APIs. Define the tools, and the AI decides when to use them:

// Define a tool
const tools = [{
  name: "search_database",
  description: "Search the product database",
  input_schema: {
    type: "object",
    properties: {
      query: { type: "string", description: "Search query" },
      category: { type: "string", enum: ["electronics", "books", "clothing"] }
    },
    required: ["query"]
  }
}];

// The AI can now call search_database() when needed
// Your code executes the function and sends the result back

3. Embeddings for Semantic Search

Embeddings convert text into vectors for semantic search. OpenAI and Google both offer embedding APIs:

// OpenAI embeddings
const embedding = await openai.embeddings.create({
  model: "text-embedding-3-small",  // $0.02/1M tokens โ€” cheapest
  input: "How to deploy Next.js to Vercel",
});

// Store in vector DB (pgvector, Pinecone, Chroma)
// Query: find similar docs by cosine similarity

4. Cost Optimization Strategies

StrategySavingsHow
Model routing50-80%Route simple tasks to Haiku/Flash, complex to Sonnet/Pro
Caching50-90%Cache common responses. Anthropic has built-in prompt caching.
Shorter prompts20-40%System prompts are charged per request. Keep them tight.
Batch processing50%OpenAI batch API is 50% cheaper (24h turnaround).
Token limitsVariableSet max_tokens to prevent runaway costs.
Self-host small models90%+Use local models for classification/summarization tasks.

5. Error Handling Pattern

async function callAI(prompt: string): Promise<string> {
  const maxRetries = 3;
  for (let i = 0; i < maxRetries; i++) {
    try {
      const response = await client.messages.create({
        model: "claude-sonnet-4-20250514",
        max_tokens: 4096,
        messages: [{ role: "user", content: prompt }],
      });
      return response.content[0].text;
    } catch (error) {
      if (error.status === 429) {  // Rate limited
        await sleep(Math.pow(2, i) * 1000);  // Exponential backoff
        continue;
      }
      if (error.status === 400) throw error;  // Bad request โ€” don't retry
      throw error;
    }
  }
}

Bottom line: Use streaming for any user-facing feature. Use function calling to extend the AI with your own data. Cache aggressively. Route simple queries to cheaper models. See also: Prompt Engineering and Best LLMs for Coding.