Vercel's free tier is generous enough that most developers never think about billing — until they do. Then they get a bill for hundreds of dollars on a project they thought was essentially free, and they don't understand why.
This guide breaks down exactly how Vercel function pricing works, what triggers cost spikes, and the practical changes that bring bills back under control without migrating away from Vercel.
How Vercel Actually Charges for Functions
Vercel charges for serverless functions on two dimensions: GB-seconds (execution units) and the number of invocations. Understanding both is essential.
| Metric | What It Measures | Pro Plan Included | Overage Rate |
|---|---|---|---|
| Function Invocations | Each time a function is called | 1,000,000 / month | $0.60 per additional 1M |
| Function Duration (GB-seconds) | Memory × execution time | 1,000 GB-s / month | $0.18 per additional 100 GB-s |
| Edge Function Invocations | Middleware/Edge function calls | 1,000,000 / month | $2.00 per additional 1M |
| Bandwidth | Data transferred out | 1 TB / month | $0.15 per additional GB |
GB-seconds is the one that surprises people. If your function uses 512 MB of memory and runs for 2 seconds, that's 1 GB-second per invocation. At 10,000 requests/day that's 300,000 GB-seconds per month — well within the Pro plan limit. But if your function uses 3 GB of memory and runs for 5 seconds, you consume 15 GB-seconds per call, exhausting the Pro plan allowance in just 1,667 requests.
The Five Patterns That Cause Unexpected Bills
1. AI API Calls Inside Functions
Calling OpenAI or Anthropic inside a serverless function is the most common source of runaway costs. LLM calls take 2–30 seconds and often require more memory for response buffering. A function that calls GPT-4 with a large context window can consume 10–50 GB-seconds per request.
// This pattern is expensive at scale
export async function POST(req: Request) {
const response = await openai.chat.completions.create({
model: 'gpt-4o',
messages: [...],
max_tokens: 4096,
});
// Function runs for 8-15 seconds, consuming ~8-15 GB-seconds
return Response.json(response);
}Each long LLM call inside a Vercel function consumes many GB-seconds. At scale, this is often cheaper to offload to a background job service (Trigger.dev, Inngest) or a dedicated compute provider (Railway, Modal).2. Missing Response Caching
Every request that reaches a function is a billable invocation. Static or semi-static content that regenerates on every request wastes both invocations and GB-seconds.
// Without caching: every request hits the function
export async function GET() {
const data = await fetchFromDatabase();
return Response.json(data);
}
// With caching: CDN serves most requests; function only runs on cache miss
export async function GET() {
const data = await fetchFromDatabase();
return Response.json(data, {
headers: {
'Cache-Control': 's-maxage=60, stale-while-revalidate=300',
},
});
}3. Middleware Running on Every Request
Vercel Middleware runs on the Edge — which is fast and cheap per invocation, but it runs on every single request including static assets, images, and CSS files. If you're doing database lookups in middleware, you're paying for a database round-trip on every page load.
// Bad: middleware with DB lookup runs on EVERY request
export function middleware(request: NextRequest) {
const session = await db.getSession(request.cookies.get('session'));
// This DB call runs for .css, .js, .png requests too
}
// Good: only run expensive logic on API routes and pages
export const config = {
matcher: ['/api/:path*', '/dashboard/:path*'],
};4. Waterfall Database Queries
Functions that make multiple sequential database calls multiply both duration and memory usage. A function making 5 sequential Postgres queries might run for 800ms instead of 150ms.
// Sequential queries: ~800ms total
const user = await db.getUser(userId);
const org = await db.getOrg(user.orgId);
const docs = await db.getDocs(org.id);
const perms = await db.getPermissions(user.id);
// Parallel queries: ~200ms total
const [user, perms] = await Promise.all([
db.getUser(userId),
db.getPermissions(userId),
]);
const [org, docs] = await Promise.all([
db.getOrg(user.orgId),
db.getDocs(user.orgId),
]);5. Oversized Function Bundles
Cold starts become more expensive when your function bundle is large. A function that imports an entire ML library or PDF processing package on every cold start adds 2–5 seconds of initialisation time — all billed GB-seconds.
// Avoid: importing heavy libraries at module level
import { PDFDocument } from 'pdf-lib'; // loaded on every cold start
// Better: dynamic import only when needed
export async function POST(req: Request) {
if (req.headers.get('content-type')?.includes('pdf')) {
const { PDFDocument } = await import('pdf-lib');
// ...
}
}Serverless Functions vs Edge Functions: When to Use Each
| Serverless Functions | Edge Functions / Middleware | |
|---|---|---|
| Runtime | Node.js (full) | V8 isolates (limited APIs) |
| Max Duration | 60s (Pro) / 800s (Enterprise) | 30s |
| Memory | Up to 3 GB | 128 MB |
| Cold Start | 100–500ms | 0–5ms (no cold start) |
| Cost Model | GB-seconds + invocations | Invocations only |
| Best For | DB queries, AI calls, heavy processing | Auth checks, redirects, geolocation, A/B tests |
Move auth checks, redirects, and simple header manipulation to Edge Middleware. Reserve serverless functions for work that requires Node.js APIs or significant memory. This reduces both cold start impact and GB-second consumption.Practical Cost Reduction Strategies
Set Function Memory and Duration Limits
Vercel functions default to 1 GB memory. Most API routes need far less. Setting memory explicitly both reduces cost and surfaces memory pressure bugs early.
// vercel.json — configure per-route function settings
{
"functions": {
"app/api/chat/route.ts": {
"memory": 512,
"maxDuration": 30
},
"app/api/data/route.ts": {
"memory": 256,
"maxDuration": 10
}
}
}Use Spend Limits (Vercel Pro)
Vercel Pro allows you to set a monthly spend cap that pauses function execution when reached rather than running up an unlimited bill. This is not enabled by default — you must set it explicitly.
Go to: Vercel Dashboard > Settings > Billing > Spend Management. Set a limit below your comfort threshold. Your site continues serving cached static content even when the cap is hit; only dynamic function execution pauses.
The spend cap only applies to overages above the plan's included usage. It does not prevent you from being charged for your base plan.Move Long-Running Work Off Vercel Functions
If you're running LLM chains, document processing, or any work that regularly takes more than 5 seconds, Vercel functions are the wrong tool. Options:
- Trigger.dev or Inngest: offload to a background job, return a job ID immediately, poll for completion
- Railway or Render: run a dedicated Node.js server for long-running endpoints — flat monthly rate, no per-second billing
- Modal: for GPU-accelerated AI inference — pay per compute, not per wall-clock second of a waiting function
// Pattern: offload to Trigger.dev, return job ID immediately
import { tasks } from '@trigger.dev/sdk/v3';
export async function POST(req: Request) {
const { documentId } = await req.json();
// Enqueue the job — returns immediately (< 100ms)
const handle = await tasks.trigger('process-document', { documentId });
// Return job ID; client polls /api/jobs/[id] for status
return Response.json({ jobId: handle.id });
// Vercel function ran for ~100ms instead of 30 seconds
}Reading Your Vercel Bill
Vercel's usage dashboard (Settings > Billing > Usage) breaks down consumption by function path. Sort by GB-seconds descending — the top 3 functions almost always account for 80%+ of your bill. Fix those first.
The Functions tab in Vercel Analytics shows p50, p95, and p99 duration for each route. If your p95 duration is above 5 seconds for any route, that route is the target for optimisation.
Enable Vercel's spending notifications under Settings > Billing > Notifications. Set alerts at 50% and 80% of your expected monthly spend. You want to know before the bill arrives, not after.When to Leave Vercel
Vercel is the right choice when: you're building a Next.js application, your functions are short-lived (< 5 seconds), and the developer experience value outweighs the per-invocation cost.
Consider alternatives when: your functions regularly run for 10+ seconds, you're running AI workloads that need GPU access, or you want predictable flat-rate hosting costs. Railway and Render offer flat monthly pricing; Modal offers pay-per-compute for AI workloads.
The hybrid approach works well in practice: Vercel for the frontend and short API routes, Railway or Render for long-running backend services, and Modal for GPU inference — all behind a single domain using Vercel Rewrites to proxy to the backend.
| Metadata | Value |
|---|---|
| Title | The True Cost of Vercel Functions at Scale: What the Pricing Page Doesn't Tell You |
| Tool | Vercel |
| Primary SEO keyword | vercel function cost |
| Secondary keywords | vercel pricing serverless, vercel functions expensive, vercel GB-seconds, vercel spend limit |
| Estimated read time | 10 minutes |
| Research date | 2026-04-14 |