Upstash is serverless Redis — you get a fully managed Redis instance with per-request pricing and a REST API, so it works in Edge Functions, Cloudflare Workers, and Vercel Functions where persistent TCP connections aren't available. For AI applications, Upstash Redis covers three critical use cases: rate limiting API endpoints, caching expensive LLM responses, and storing short-lived session state for multi-turn conversations.

Why Upstash for AI Apps

Use Case Why Redis Why Upstash Specifically
Rate limiting Atomic INCR + EXPIRE for sliding windows REST API works in Edge/Vercel Functions; per-request pricing
Response caching Sub-millisecond reads for cached completions Global replication puts cache close to your edge functions
Conversation memory TTL-based expiry for session cleanup No connection pooling needed; HTTP client works anywhere
Job queues (QStash) Durable message delivery with retry Built-in queue service — no separate Redis queue setup

Setup

# Install the Upstash Redis client npm install @upstash/redis # For rate limiting: npm install @upstash/ratelimit

Create a database at console.upstash.com. Copy the REST URL and REST token from the dashboard.

# .env.local UPSTASH_REDIS_REST_URL=https://your-db.upstash.io UPSTASH_REDIS_REST_TOKEN=your-token-here

Rate Limiting AI Endpoints

Use `@upstash/ratelimit` to protect your `/api/chat` endpoint. The sliding window algorithm is most accurate for AI APIs where you want to enforce per-user limits.

// lib/rate-limit.ts import { Ratelimit } from "@upstash/ratelimit"; import { Redis } from "@upstash/redis"; const redis = new Redis({ url: process.env.UPSTASH_REDIS_REST_URL!, token: process.env.UPSTASH_REDIS_REST_TOKEN!, }); // 10 requests per user per minute export const chatRateLimit = new Ratelimit({ redis, limiter: Ratelimit.slidingWindow(10, "1 m"), analytics: true, // store usage stats in Redis prefix: "ratelimit:chat", }); // 100 requests per IP per hour (fallback for unauthenticated users) export const globalRateLimit = new Ratelimit({ redis, limiter: Ratelimit.slidingWindow(100, "1 h"), prefix: "ratelimit:global", });

// app/api/chat/route.ts import { chatRateLimit, globalRateLimit } from "@/lib/rate-limit"; import { NextResponse } from "next/server"; import { headers } from "next/headers"; export const runtime = "edge"; export async function POST(req: Request) { const { userId } = await getAuthFromRequest(req); const ip = headers().get("x-forwarded-for") ?? "anonymous"; // Check rate limit — use userId if authenticated, IP otherwise const identifier = userId ?? `ip:${ip}`; const limiter = userId ? chatRateLimit : globalRateLimit; const { success, limit, remaining, reset } = await limiter.limit(identifier); if (!success) { return NextResponse.json( { error: "Rate limit exceeded. Please wait before sending another message." }, { status: 429, headers: { "X-RateLimit-Limit": limit.toString(), "X-RateLimit-Remaining": remaining.toString(), "X-RateLimit-Reset": reset.toString(), }, } ); } // Proceed with AI response... const { messages } = await req.json(); const response = await generateAIResponse(messages); return NextResponse.json(response); }

Caching LLM Responses

Many AI queries are repeated — FAQ answers, product descriptions, common search queries. Cache the LLM response for a TTL and return it instantly on repeat calls.

// lib/cached-completion.ts import { Redis } from "@upstash/redis"; import OpenAI from "openai"; import { createHash } from "crypto"; const redis = new Redis({ url: process.env.UPSTASH_REDIS_REST_URL!, token: process.env.UPSTASH_REDIS_REST_TOKEN!, }); const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY }); export async function cachedCompletion( prompt: string, options: { ttl?: number; model?: string } = {} ): Promise<string> { const { ttl = 3600, model = "gpt-4o-mini" } = options; // Create a deterministic cache key from the prompt const cacheKey = `llm:${createHash("sha256").update(`${model}:${prompt}`).digest("hex")}`; // Check cache first const cached = await redis.get<string>(cacheKey); if (cached) { console.log("Cache hit:", cacheKey.slice(0, 16)); return cached; } // Cache miss — call OpenAI const response = await openai.chat.completions.create({ model, messages: [{ role: "user", content: prompt }], }); const content = response.choices[0].message.content!; // Store with TTL await redis.setex(cacheKey, ttl, content); return content; }

Conversation Session Storage

Store multi-turn conversation history in Redis with automatic TTL expiry. This is lighter-weight than a database for temporary session data.

// lib/conversation-store.ts import { Redis } from "@upstash/redis"; const redis = new Redis({ url: process.env.UPSTASH_REDIS_REST_URL!, token: process.env.UPSTASH_REDIS_REST_TOKEN!, }); type Message = { role: "user" | "assistant"; content: string }; const SESSION_TTL = 60 * 60 * 2; // 2 hours const MAX_MESSAGES = 20; // keep last 20 messages export async function getConversation(sessionId: string): Promise<Message[]> { const key = `conversation:${sessionId}`; const messages = await redis.lrange<Message>(key, 0, -1); return messages; } export async function appendMessage(sessionId: string, message: Message): Promise<void> { const key = `conversation:${sessionId}`; // Append message and trim to max length await redis.rpush(key, message); await redis.ltrim(key, -MAX_MESSAGES, -1); await redis.expire(key, SESSION_TTL); } export async function clearConversation(sessionId: string): Promise<void> { await redis.del(`conversation:${sessionId}`); }

Use `redis.pipeline()` to batch multiple Redis commands into a single HTTP request. This is critical for performance in Edge Functions where each request has latency overhead.

Upstash Pricing

Tier Requests/day Storage Price
Free 10,000 256MB $0
Pay-as-you-go Unlimited Unlimited $0.2 per 100k requests
Pro 2K 2M/day 1GB $10/month