Step 1
Installation & Setup
Install the Vercel AI SDK and your preferred AI provider (OpenAI, Anthropic, Google, etc.). Set up environment variables and configure your Next.js app.
# Install AI SDK and OpenAI provider
npm install ai @ai-sdk/openai
# Or with other providers
npm install ai @ai-sdk/anthropic
npm install ai @ai-sdk/google
# Install React hooks for UI
npm install ai
Environment Variables (.env.local):
# OpenAI API Key
OPENAI_API_KEY=sk-...
# Anthropic API Key (optional)
ANTHROPIC_API_KEY=sk-ant-...
# Google AI API Key (optional)
GOOGLE_GENERATIVE_AI_API_KEY=...
💡 Key Point: The Vercel AI SDK is provider-agnostic. You can easily switch between OpenAI, Anthropic, Google, and other providers without changing your application code.
Step 2
Basic Chat Completion (Server-Side)
Create a server-side function to generate text completions. This runs entirely on the server and returns the full response.
// app/actions/ai.ts
"use server";
import { openai } from "@ai-sdk/openai";
import { generateText } from "ai";
export async function generateCompletion(prompt: string) {
try {
const { text } = await generateText({
model: openai("gpt-4-turbo"),
prompt: prompt,
temperature: 0.7,
maxTokens: 500,
});
return { success: true, text };
} catch (error) {
console.error("AI Error:", error);
return { success: false, error: "Failed to generate completion" };
}
}
// Using with system messages
export async function generateWithSystem(userMessage: string) {
const { text } = await generateText({
model: openai("gpt-4-turbo"),
system: "You are a helpful coding assistant. Provide concise, accurate answers.",
prompt: userMessage,
});
return text;
}
📚 Understanding AI Parameters (Beginner's Guide)
🌡️ temperature (0.0 to 2.0)Controls randomness/creativity. Lower = more focused & deterministic (0.0-0.3 for code, factual answers). Higher = more creative & varied (0.7-1.0 for stories, brainstorming). Default is usually 0.7. Use 0.0 for math or code generation.
🎯 maxTokens (1 to model max)Maximum length of response. 1 token ≈ 0.75 words (or ~4 characters). 500 tokens ≈ 375 words (short paragraph). 2000 tokens ≈ 1500 words (full article).Higher = longer responses but more expensive. Set limits to control costs.
💬 prompt vs systemprompt: The user's question/input. system: Instructions that set AI's behavior/personality (e.g., "You are a helpful coding assistant"). System messages guide the AI's overall behavior across all responses.
🤖 model selectiongpt-4-turbo: Most capable, slower, expensive. gpt-3.5-turbo: Faster, cheaper, good for simple tasks.gpt-4: Best reasoning. Choose based on complexity vs cost tradeoff.
💰 Cost Example: GPT-4-Turbo costs ~$0.01 per 1K input tokens, ~$0.03 per 1K output tokens. A 500-token response costs ~$0.015. Use gpt-3.5-turbo (~$0.001) for simple tasks to save 90%+.
Client Component Usage:
"use client";
import { useState } from "react";
import { generateCompletion } from "@/app/actions/ai";
export default function SimpleChat() {
const [prompt, setPrompt] = useState("");
const [response, setResponse] = useState("");
const [loading, setLoading] = useState(false);
const handleSubmit = async (e: React.FormEvent) => {
e.preventDefault();
setLoading(true);
const result = await generateCompletion(prompt);
if (result.success) {
setResponse(result.text);
}
setLoading(false);
};
return (
<form onSubmit={handleSubmit}>
<textarea
value={prompt}
onChange={(e) => setPrompt(e.target.value)}
placeholder="Ask anything..."
/>
<button type="submit" disabled={loading}>
{loading ? "Generating..." : "Submit"}
</button>
{response && <div>{response}</div>}
</form>
);
}
💡 Key Point: Use server actions for simple completions. This keeps your API keys secure and runs on the server. For real-time streaming, use Route Handlers (next step).
Step 3
Streaming Responses (ChatGPT-like)
Stream AI responses in real-time for a better user experience. The useChat hook handles all the complexity.
// app/api/chat/route.ts
import { openai } from "@ai-sdk/openai";
import { streamText } from "ai";
export const runtime = "edge"; // Optional: Use Edge Runtime for lower latency
export async function POST(req: Request) {
const { messages } = await req.json();
const result = await streamText({
model: openai("gpt-4-turbo"),
system: "You are a helpful assistant.",
messages,
temperature: 0.7,
maxTokens: 1000,
});
return result.toAIStreamResponse();
}
Client Component with useChat Hook:
"use client";
import { useChat } from "ai/react";
export default function Chat() {
const { messages, input, handleInputChange, handleSubmit, isLoading } =
useChat({
api: "/api/chat",
});
return (
<div style={{ maxWidth: "800px", margin: "0 auto", padding: "2rem" }}>
{/* Messages */}
<div style={{ marginBottom: "1rem" }}>
{messages.map((message) => (
<div
key={message.id}
style={{
padding: "1rem",
marginBottom: "0.5rem",
background:
message.role === "user" ? "var(--blue)" : "var(--surface)",
borderRadius: "8px",
}}
>
<strong>{message.role === "user" ? "You" : "AI"}:</strong>
<p>{message.content}</p>
</div>
))}
</div>
{/* Input Form */}
<form onSubmit={handleSubmit} style={{ display: "flex", gap: "0.5rem" }}>
<input
value={input}
onChange={handleInputChange}
placeholder="Type your message..."
disabled={isLoading}
style={{ flex: 1, padding: "0.75rem", borderRadius: "6px" }}
/>
<button type="submit" disabled={isLoading}>
{isLoading ? "..." : "Send"}
</button>
</form>
</div>
);
}
💡 Key Point: useChat automatically handles message state, optimistic updates, streaming, and error handling. It's the easiest way to build ChatGPT-like interfaces.
Step 4
Function Calling / Tools
Give AI the ability to call your functions and access external data. Perfect for weather apps, calculators, database queries, and more.
// app/api/chat-with-tools/route.ts
import { openai } from "@ai-sdk/openai";
import { streamText, tool } from "ai";
import { z } from "zod";
export async function POST(req: Request) {
const { messages } = await req.json();
const result = await streamText({
model: openai("gpt-4-turbo"),
messages,
tools: {
// Weather tool
getWeather: tool({
description: "Get the current weather for a location",
parameters: z.object({
city: z.string().describe("The city name"),
country: z.string().describe("The country code (e.g., US, UK)"),
}),
execute: async ({ city, country }) => {
// Call weather API
const response = await fetch(
`https://api.openweathermap.org/data/2.5/weather?q=${city},${country}&appid=YOUR_KEY`
);
const data = await response.json();
return {
temperature: data.main.temp,
description: data.weather[0].description,
};
},
}),
// Calculator tool
calculate: tool({
description: "Perform mathematical calculations",
parameters: z.object({
expression: z.string().describe("Math expression (e.g., '2 + 2')"),
}),
execute: async ({ expression }) => {
try {
// Use a safe math evaluator
const result = eval(expression);
return { result };
} catch (error) {
return { error: "Invalid expression" };
}
},
}),
// Database query tool
searchDatabase: tool({
description: "Search the product database",
parameters: z.object({
query: z.string().describe("Search query"),
limit: z.number().optional().describe("Max results"),
}),
execute: async ({ query, limit = 5 }) => {
// Query your database
const products = await db.products.findMany({
where: { name: { contains: query } },
take: limit,
});
return { products };
},
}),
},
});
return result.toAIStreamResponse();
}
Client Usage (same as before):
"use client";
import { useChat } from "ai/react";
export default function ChatWithTools() {
const { messages, input, handleInputChange, handleSubmit } = useChat({
api: "/api/chat-with-tools",
});
return (
<div>
{messages.map((m) => (
<div key={m.id}>
<strong>{m.role}:</strong> {m.content}
{/* Show tool calls */}
{m.toolInvocations?.map((tool, i) => (
<div key={i} style={{ background: "var(--surface-elevated)" }}>
<code>{tool.toolName}</code> called with {JSON.stringify(tool.args)}
<br />
Result: {JSON.stringify(tool.result)}
</div>
))}
</div>
))}
<form onSubmit={handleSubmit}>
<input value={input} onChange={handleInputChange} />
<button type="submit">Send</button>
</form>
</div>
);
}
💡 Key Point: Tools let AI interact with your backend. The AI decides WHEN to call tools based on user input. Use Zod schemas for type-safe parameters.
Step 5
RAG Pattern (Retrieval-Augmented Generation)
Give AI access to your custom knowledge base. Retrieve relevant documents and inject them into the prompt for accurate, contextual responses.
// lib/vectorStore.ts
import { openai } from "@ai-sdk/openai";
import { embed } from "ai";
// Generate embeddings for a text
export async function generateEmbedding(text: string) {
const { embedding } = await embed({
model: openai.embedding("text-embedding-3-small"),
value: text,
});
return embedding;
}
// Store document with embedding
export async function storeDocument(text: string, metadata: any) {
const embedding = await generateEmbedding(text);
// Store in your vector database (Pinecone, Supabase, Qdrant, etc.)
await vectorDB.upsert({
id: metadata.id,
values: embedding,
metadata: { text, ...metadata },
});
}
// Search for relevant documents
export async function searchDocuments(query: string, limit = 3) {
const queryEmbedding = await generateEmbedding(query);
// Query vector database
const results = await vectorDB.query({
vector: queryEmbedding,
topK: limit,
includeMetadata: true,
});
return results.matches.map((match) => match.metadata.text);
}
RAG Chat Route:
// app/api/chat-rag/route.ts
import { openai } from "@ai-sdk/openai";
import { streamText } from "ai";
import { searchDocuments } from "@/lib/vectorStore";
export async function POST(req: Request) {
const { messages } = await req.json();
const lastMessage = messages[messages.length - 1].content;
// 1. Retrieve relevant documents
const relevantDocs = await searchDocuments(lastMessage, 3);
// 2. Inject context into the system prompt
const context = relevantDocs.join("\n\n");
const result = await streamText({
model: openai("gpt-4-turbo"),
system: `You are a helpful assistant. Use the following context to answer questions.
Context:
${context}
If the answer is not in the context, say "I don't have information about that."`,
messages,
});
return result.toAIStreamResponse();
}
// Example: Batch indexing documents
export async function indexDocuments() {
const documents = [
{ id: "1", text: "Next.js is a React framework for production..." },
{ id: "2", text: "TypeScript adds static typing to JavaScript..." },
{ id: "3", text: "The Vercel AI SDK makes it easy to build AI apps..." },
];
for (const doc of documents) {
await storeDocument(doc.text, { id: doc.id });
}
}
💡 Key Point: RAG lets AI answer questions about YOUR data (docs, products, support tickets). First, embed and store your documents. Then retrieve relevant ones at query time and inject into the prompt.
Best Practices
Production Tips & Best Practices
- Rate Limiting: Implement rate limits per user to prevent abuse. Use Redis or middleware.
- Error Handling: Always wrap AI calls in try-catch. Show user-friendly error messages.
- Caching: Cache common queries to save costs. Use Redis or Next.js cache.
- Streaming: Always use streaming for better UX. Users see responses immediately.
- Model Selection: Use cheaper models (gpt-3.5) for simple tasks. Use gpt-4 for complex reasoning.
- Token Limits: Set maxTokens to control costs. Track usage with OpenAI dashboard.
- Security: Never expose API keys to the client. Always use server-side routes.
- Monitoring: Log all AI requests for debugging and analytics. Track costs per user.
- Prompt Engineering: Spend time crafting good system prompts. Test with various inputs.
- Fallbacks: Have fallback responses when AI fails. Don't let users see raw errors.
Example: Rate Limiting Middleware
// lib/rateLimiter.ts
import { Ratelimit } from "@upstash/ratelimit";
import { Redis } from "@upstash/redis";
const redis = new Redis({
url: process.env.UPSTASH_REDIS_REST_URL!,
token: process.env.UPSTASH_REDIS_REST_TOKEN!,
});
export const ratelimit = new Ratelimit({
redis,
limiter: Ratelimit.slidingWindow(10, "1 m"), // 10 requests per minute
});
// Use in API route
export async function POST(req: Request) {
const ip = req.headers.get("x-forwarded-for") ?? "anonymous";
const { success } = await ratelimit.limit(ip);
if (!success) {
return new Response("Too many requests", { status: 429 });
}
// Continue with AI request...
}