🤖 AI Integration with Vercel AI SDK

Build AI-powered applications with streaming responses, function calling, and RAG patterns using the Vercel AI SDK.

Step 1

Installation & Setup

Install the Vercel AI SDK and your preferred AI provider (OpenAI, Anthropic, Google, etc.). Set up environment variables and configure your Next.js app.

# Install AI SDK and OpenAI provider
npm install ai @ai-sdk/openai

# Or with other providers
npm install ai @ai-sdk/anthropic
npm install ai @ai-sdk/google

# Install React hooks for UI
npm install ai

Environment Variables (.env.local):

# OpenAI API Key
OPENAI_API_KEY=sk-...

# Anthropic API Key (optional)
ANTHROPIC_API_KEY=sk-ant-...

# Google AI API Key (optional)
GOOGLE_GENERATIVE_AI_API_KEY=...

💡 Key Point: The Vercel AI SDK is provider-agnostic. You can easily switch between OpenAI, Anthropic, Google, and other providers without changing your application code.

Step 2

Basic Chat Completion (Server-Side)

Create a server-side function to generate text completions. This runs entirely on the server and returns the full response.

// app/actions/ai.ts
"use server";

import { openai } from "@ai-sdk/openai";
import { generateText } from "ai";

export async function generateCompletion(prompt: string) {
  try {
    const { text } = await generateText({
      model: openai("gpt-4-turbo"),
      prompt: prompt,
      temperature: 0.7,
      maxTokens: 500,
    });

    return { success: true, text };
  } catch (error) {
    console.error("AI Error:", error);
    return { success: false, error: "Failed to generate completion" };
  }
}

// Using with system messages
export async function generateWithSystem(userMessage: string) {
  const { text } = await generateText({
    model: openai("gpt-4-turbo"),
    system: "You are a helpful coding assistant. Provide concise, accurate answers.",
    prompt: userMessage,
  });

  return text;
}

📚 Understanding AI Parameters (Beginner's Guide)

🌡️ temperature (0.0 to 2.0)

Controls randomness/creativity. Lower = more focused & deterministic (0.0-0.3 for code, factual answers). Higher = more creative & varied (0.7-1.0 for stories, brainstorming). Default is usually 0.7. Use 0.0 for math or code generation.

🎯 maxTokens (1 to model max)

Maximum length of response. 1 token ≈ 0.75 words (or ~4 characters). 500 tokens ≈ 375 words (short paragraph). 2000 tokens ≈ 1500 words (full article).Higher = longer responses but more expensive. Set limits to control costs.

💬 prompt vs system

prompt: The user's question/input. system: Instructions that set AI's behavior/personality (e.g., "You are a helpful coding assistant"). System messages guide the AI's overall behavior across all responses.

🤖 model selection

gpt-4-turbo: Most capable, slower, expensive. gpt-3.5-turbo: Faster, cheaper, good for simple tasks.gpt-4: Best reasoning. Choose based on complexity vs cost tradeoff.

💰 Cost Example: GPT-4-Turbo costs ~$0.01 per 1K input tokens, ~$0.03 per 1K output tokens. A 500-token response costs ~$0.015. Use gpt-3.5-turbo (~$0.001) for simple tasks to save 90%+.

Client Component Usage:

"use client";

import { useState } from "react";
import { generateCompletion } from "@/app/actions/ai";

export default function SimpleChat() {
  const [prompt, setPrompt] = useState("");
  const [response, setResponse] = useState("");
  const [loading, setLoading] = useState(false);

  const handleSubmit = async (e: React.FormEvent) => {
    e.preventDefault();
    setLoading(true);
    
    const result = await generateCompletion(prompt);
    
    if (result.success) {
      setResponse(result.text);
    }
    
    setLoading(false);
  };

  return (
    <form onSubmit={handleSubmit}>
      <textarea
        value={prompt}
        onChange={(e) => setPrompt(e.target.value)}
        placeholder="Ask anything..."
      />
      <button type="submit" disabled={loading}>
        {loading ? "Generating..." : "Submit"}
      </button>
      {response && <div>{response}</div>}
    </form>
  );
}

💡 Key Point: Use server actions for simple completions. This keeps your API keys secure and runs on the server. For real-time streaming, use Route Handlers (next step).

Step 3

Streaming Responses (ChatGPT-like)

Stream AI responses in real-time for a better user experience. The useChat hook handles all the complexity.

// app/api/chat/route.ts
import { openai } from "@ai-sdk/openai";
import { streamText } from "ai";

export const runtime = "edge"; // Optional: Use Edge Runtime for lower latency

export async function POST(req: Request) {
  const { messages } = await req.json();

  const result = await streamText({
    model: openai("gpt-4-turbo"),
    system: "You are a helpful assistant.",
    messages,
    temperature: 0.7,
    maxTokens: 1000,
  });

  return result.toAIStreamResponse();
}

Client Component with useChat Hook:

"use client";

import { useChat } from "ai/react";

export default function Chat() {
  const { messages, input, handleInputChange, handleSubmit, isLoading } =
    useChat({
      api: "/api/chat",
    });

  return (
    <div style={{ maxWidth: "800px", margin: "0 auto", padding: "2rem" }}>
      {/* Messages */}
      <div style={{ marginBottom: "1rem" }}>
        {messages.map((message) => (
          <div
            key={message.id}
            style={{
              padding: "1rem",
              marginBottom: "0.5rem",
              background:
                message.role === "user" ? "var(--blue)" : "var(--surface)",
              borderRadius: "8px",
            }}
          >
            <strong>{message.role === "user" ? "You" : "AI"}:</strong>
            <p>{message.content}</p>
          </div>
        ))}
      </div>

      {/* Input Form */}
      <form onSubmit={handleSubmit} style={{ display: "flex", gap: "0.5rem" }}>
        <input
          value={input}
          onChange={handleInputChange}
          placeholder="Type your message..."
          disabled={isLoading}
          style={{ flex: 1, padding: "0.75rem", borderRadius: "6px" }}
        />
        <button type="submit" disabled={isLoading}>
          {isLoading ? "..." : "Send"}
        </button>
      </form>
    </div>
  );
}

💡 Key Point: useChat automatically handles message state, optimistic updates, streaming, and error handling. It's the easiest way to build ChatGPT-like interfaces.

Step 4

Function Calling / Tools

Give AI the ability to call your functions and access external data. Perfect for weather apps, calculators, database queries, and more.

// app/api/chat-with-tools/route.ts
import { openai } from "@ai-sdk/openai";
import { streamText, tool } from "ai";
import { z } from "zod";

export async function POST(req: Request) {
  const { messages } = await req.json();

  const result = await streamText({
    model: openai("gpt-4-turbo"),
    messages,
    tools: {
      // Weather tool
      getWeather: tool({
        description: "Get the current weather for a location",
        parameters: z.object({
          city: z.string().describe("The city name"),
          country: z.string().describe("The country code (e.g., US, UK)"),
        }),
        execute: async ({ city, country }) => {
          // Call weather API
          const response = await fetch(
            `https://api.openweathermap.org/data/2.5/weather?q=${city},${country}&appid=YOUR_KEY`
          );
          const data = await response.json();
          return {
            temperature: data.main.temp,
            description: data.weather[0].description,
          };
        },
      }),

      // Calculator tool
      calculate: tool({
        description: "Perform mathematical calculations",
        parameters: z.object({
          expression: z.string().describe("Math expression (e.g., '2 + 2')"),
        }),
        execute: async ({ expression }) => {
          try {
            // Use a safe math evaluator
            const result = eval(expression);
            return { result };
          } catch (error) {
            return { error: "Invalid expression" };
          }
        },
      }),

      // Database query tool
      searchDatabase: tool({
        description: "Search the product database",
        parameters: z.object({
          query: z.string().describe("Search query"),
          limit: z.number().optional().describe("Max results"),
        }),
        execute: async ({ query, limit = 5 }) => {
          // Query your database
          const products = await db.products.findMany({
            where: { name: { contains: query } },
            take: limit,
          });
          return { products };
        },
      }),
    },
  });

  return result.toAIStreamResponse();
}

Client Usage (same as before):

"use client";

import { useChat } from "ai/react";

export default function ChatWithTools() {
  const { messages, input, handleInputChange, handleSubmit } = useChat({
    api: "/api/chat-with-tools",
  });

  return (
    <div>
      {messages.map((m) => (
        <div key={m.id}>
          <strong>{m.role}:</strong> {m.content}
          
          {/* Show tool calls */}
          {m.toolInvocations?.map((tool, i) => (
            <div key={i} style={{ background: "var(--surface-elevated)" }}>
              <code>{tool.toolName}</code> called with {JSON.stringify(tool.args)}
              <br />
              Result: {JSON.stringify(tool.result)}
            </div>
          ))}
        </div>
      ))}
      
      <form onSubmit={handleSubmit}>
        <input value={input} onChange={handleInputChange} />
        <button type="submit">Send</button>
      </form>
    </div>
  );
}

💡 Key Point: Tools let AI interact with your backend. The AI decides WHEN to call tools based on user input. Use Zod schemas for type-safe parameters.

Step 5

RAG Pattern (Retrieval-Augmented Generation)

Give AI access to your custom knowledge base. Retrieve relevant documents and inject them into the prompt for accurate, contextual responses.

// lib/vectorStore.ts
import { openai } from "@ai-sdk/openai";
import { embed } from "ai";

// Generate embeddings for a text
export async function generateEmbedding(text: string) {
  const { embedding } = await embed({
    model: openai.embedding("text-embedding-3-small"),
    value: text,
  });
  return embedding;
}

// Store document with embedding
export async function storeDocument(text: string, metadata: any) {
  const embedding = await generateEmbedding(text);
  
  // Store in your vector database (Pinecone, Supabase, Qdrant, etc.)
  await vectorDB.upsert({
    id: metadata.id,
    values: embedding,
    metadata: { text, ...metadata },
  });
}

// Search for relevant documents
export async function searchDocuments(query: string, limit = 3) {
  const queryEmbedding = await generateEmbedding(query);
  
  // Query vector database
  const results = await vectorDB.query({
    vector: queryEmbedding,
    topK: limit,
    includeMetadata: true,
  });
  
  return results.matches.map((match) => match.metadata.text);
}

RAG Chat Route:

// app/api/chat-rag/route.ts
import { openai } from "@ai-sdk/openai";
import { streamText } from "ai";
import { searchDocuments } from "@/lib/vectorStore";

export async function POST(req: Request) {
  const { messages } = await req.json();
  const lastMessage = messages[messages.length - 1].content;

  // 1. Retrieve relevant documents
  const relevantDocs = await searchDocuments(lastMessage, 3);

  // 2. Inject context into the system prompt
  const context = relevantDocs.join("\n\n");

  const result = await streamText({
    model: openai("gpt-4-turbo"),
    system: `You are a helpful assistant. Use the following context to answer questions.
    
Context:
${context}

If the answer is not in the context, say "I don't have information about that."`,
    messages,
  });

  return result.toAIStreamResponse();
}

// Example: Batch indexing documents
export async function indexDocuments() {
  const documents = [
    { id: "1", text: "Next.js is a React framework for production..." },
    { id: "2", text: "TypeScript adds static typing to JavaScript..." },
    { id: "3", text: "The Vercel AI SDK makes it easy to build AI apps..." },
  ];

  for (const doc of documents) {
    await storeDocument(doc.text, { id: doc.id });
  }
}

💡 Key Point: RAG lets AI answer questions about YOUR data (docs, products, support tickets). First, embed and store your documents. Then retrieve relevant ones at query time and inject into the prompt.

Best Practices

Production Tips & Best Practices

Rate Limiting: Implement rate limits per user to prevent abuse. Use Redis or middleware.
Error Handling: Always wrap AI calls in try-catch. Show user-friendly error messages.
Caching: Cache common queries to save costs. Use Redis or Next.js cache.
Streaming: Always use streaming for better UX. Users see responses immediately.
Model Selection: Use cheaper models (gpt-3.5) for simple tasks. Use gpt-4 for complex reasoning.
Token Limits: Set maxTokens to control costs. Track usage with OpenAI dashboard.
Security: Never expose API keys to the client. Always use server-side routes.
Monitoring: Log all AI requests for debugging and analytics. Track costs per user.
Prompt Engineering: Spend time crafting good system prompts. Test with various inputs.
Fallbacks: Have fallback responses when AI fails. Don't let users see raw errors.

Example: Rate Limiting Middleware

// lib/rateLimiter.ts
import { Ratelimit } from "@upstash/ratelimit";
import { Redis } from "@upstash/redis";

const redis = new Redis({
  url: process.env.UPSTASH_REDIS_REST_URL!,
  token: process.env.UPSTASH_REDIS_REST_TOKEN!,
});

export const ratelimit = new Ratelimit({
  redis,
  limiter: Ratelimit.slidingWindow(10, "1 m"), // 10 requests per minute
});

// Use in API route
export async function POST(req: Request) {
  const ip = req.headers.get("x-forwarded-for") ?? "anonymous";
  const { success } = await ratelimit.limit(ip);

  if (!success) {
    return new Response("Too many requests", { status: 429 });
  }

  // Continue with AI request...
}