Serverless platforms like Vercel cut off functions at 60 seconds. Railway runs persistent processes with no execution time limit — making it the right choice for AI agent backends, LLM streaming servers, and any workload that needs to stay alive between requests.

Why AI Backends Need Persistent Processes

  • LLM calls with large contexts or many tool calls can take 2–5 minutes
  • Agent loops that call multiple tools in sequence exceed serverless timeouts
  • WebSocket connections for real-time agent output require persistent servers
  • In-memory caching of embeddings or model contexts between requests
  • Background polling loops for async task completion

Recipe: FastAPI Agent Backend on Railway

# main.py — runs as a persistent FastAPI server
from fastapi import FastAPI
from fastapi.responses import StreamingResponse
from langchain_openai import ChatOpenAI
from langchain.agents import AgentExecutor, create_openai_tools_agent
import asyncio, os
 
app = FastAPI()
llm = ChatOpenAI(model='gpt-4o', streaming=True)
 
@app.post('/agent/run')
async def run_agent(payload: dict):
    async def generate():
        async for chunk in agent_executor.astream(payload):
            if 'output' in chunk:
                yield f'data: {chunk["output"]}\n\n'
    return StreamingResponse(generate(), media_type='text/event-stream')
 
@app.get('/health')
def health(): return {'status': 'ok'}
# Procfile or railway.toml start command
uvicorn main:app --host 0.0.0.0 --port $PORT --workers 2

Recipe: Background Agent Worker

For fire-and-forget agent tasks, run a worker service alongside your API service in the same Railway project.

# worker.py — separate Railway service, same project
import os, time, redis
from agent import run_research_agent
 
r = redis.from_url(os.environ['REDIS_URL'])
 
def process_jobs():
    print('Worker started, waiting for jobs...')
    while True:
        # Block until a job arrives (or 5s timeout)
        job = r.brpop('agent_jobs', timeout=5)
        if job:
            _, payload = job
            result = run_research_agent(payload.decode())
            r.set(f'result:{payload}', result, ex=3600)
 
if __name__ == '__main__':
    process_jobs()

Scaling on Railway

Railway supports both vertical scaling (more CPU/RAM per instance) and horizontal scaling (multiple replicas). For AI backends, vertical scaling is usually correct — more RAM means more in-memory context and faster embedding operations.

# railway.toml
[deploy]
startCommand = "uvicorn main:app --host 0.0.0.0 --port $PORT"
numReplicas = 2  # horizontal: 2 instances behind Railway's load balancer
 
[resources]
cpu = 2      # 2 vCPUs
memory = 4096  # 4 GB RAM

Private Networking Between Services

Services in the same Railway project communicate via private hostnames — no public internet, no authentication needed, zero latency overhead.

# Your API service talks to a Redis service via private network
REDIS_URL = os.environ['REDIS_URL']
# Railway sets this to: redis://redis.railway.internal:6379
# Traffic never leaves Railway's private network
 
# Same for Postgres:
DATABASE_URL = os.environ['DATABASE_URL']
# railway.internal hostnames work only within the same project
Metadata Value
Title Railway for AI Backends: Running Long-Running Agent Processes
Tool Railway
Primary SEO keyword railway ai agent backend
Secondary keywords railway fastapi, railway long running process, railway websocket, railway persistent server
Estimated read time 7 minutes
Research date 2026-04-14