Serverless platforms like Vercel cut off functions at 60 seconds. Railway runs persistent processes with no execution time limit — making it the right choice for AI agent backends, LLM streaming servers, and any workload that needs to stay alive between requests.
Why AI Backends Need Persistent Processes
- LLM calls with large contexts or many tool calls can take 2–5 minutes
- Agent loops that call multiple tools in sequence exceed serverless timeouts
- WebSocket connections for real-time agent output require persistent servers
- In-memory caching of embeddings or model contexts between requests
- Background polling loops for async task completion
Recipe: FastAPI Agent Backend on Railway
# main.py — runs as a persistent FastAPI server
from fastapi import FastAPI
from fastapi.responses import StreamingResponse
from langchain_openai import ChatOpenAI
from langchain.agents import AgentExecutor, create_openai_tools_agent
import asyncio, os
app = FastAPI()
llm = ChatOpenAI(model='gpt-4o', streaming=True)
@app.post('/agent/run')
async def run_agent(payload: dict):
async def generate():
async for chunk in agent_executor.astream(payload):
if 'output' in chunk:
yield f'data: {chunk["output"]}\n\n'
return StreamingResponse(generate(), media_type='text/event-stream')
@app.get('/health')
def health(): return {'status': 'ok'}# Procfile or railway.toml start command
uvicorn main:app --host 0.0.0.0 --port $PORT --workers 2Recipe: Background Agent Worker
For fire-and-forget agent tasks, run a worker service alongside your API service in the same Railway project.
# worker.py — separate Railway service, same project
import os, time, redis
from agent import run_research_agent
r = redis.from_url(os.environ['REDIS_URL'])
def process_jobs():
print('Worker started, waiting for jobs...')
while True:
# Block until a job arrives (or 5s timeout)
job = r.brpop('agent_jobs', timeout=5)
if job:
_, payload = job
result = run_research_agent(payload.decode())
r.set(f'result:{payload}', result, ex=3600)
if __name__ == '__main__':
process_jobs()Scaling on Railway
Railway supports both vertical scaling (more CPU/RAM per instance) and horizontal scaling (multiple replicas). For AI backends, vertical scaling is usually correct — more RAM means more in-memory context and faster embedding operations.
# railway.toml
[deploy]
startCommand = "uvicorn main:app --host 0.0.0.0 --port $PORT"
numReplicas = 2 # horizontal: 2 instances behind Railway's load balancer
[resources]
cpu = 2 # 2 vCPUs
memory = 4096 # 4 GB RAMPrivate Networking Between Services
Services in the same Railway project communicate via private hostnames — no public internet, no authentication needed, zero latency overhead.
# Your API service talks to a Redis service via private network
REDIS_URL = os.environ['REDIS_URL']
# Railway sets this to: redis://redis.railway.internal:6379
# Traffic never leaves Railway's private network
# Same for Postgres:
DATABASE_URL = os.environ['DATABASE_URL']
# railway.internal hostnames work only within the same project| Metadata | Value |
|---|---|
| Title | Railway for AI Backends: Running Long-Running Agent Processes |
| Tool | Railway |
| Primary SEO keyword | railway ai agent backend |
| Secondary keywords | railway fastapi, railway long running process, railway websocket, railway persistent server |
| Estimated read time | 7 minutes |
| Research date | 2026-04-14 |