Railway for AI Backends: Running Long-Running Agent Processes

Serverless platforms like Vercel cut off functions at 60 seconds. Railway runs persistent processes with no execution time limit — making it the right choice for AI agent backends, LLM streaming servers, and any workload that needs to stay alive between requests.

Why AI Backends Need Persistent Processes

LLM calls with large contexts or many tool calls can take 2–5 minutes
Agent loops that call multiple tools in sequence exceed serverless timeouts
WebSocket connections for real-time agent output require persistent servers
In-memory caching of embeddings or model contexts between requests
Background polling loops for async task completion

Recipe: FastAPI Agent Backend on Railway

# main.py — runs as a persistent FastAPI server
from fastapi import FastAPI
from fastapi.responses import StreamingResponse
from langchain_openai import ChatOpenAI
from langchain.agents import AgentExecutor, create_openai_tools_agent
import asyncio, os
 
app = FastAPI()
llm = ChatOpenAI(model='gpt-4o', streaming=True)
 
@app.post('/agent/run')
async def run_agent(payload: dict):
    async def generate():
        async for chunk in agent_executor.astream(payload):
            if 'output' in chunk:
                yield f'data: {chunk["output"]}\n\n'
    return StreamingResponse(generate(), media_type='text/event-stream')
 
@app.get('/health')
def health(): return {'status': 'ok'}

# Procfile or railway.toml start command
uvicorn main:app --host 0.0.0.0 --port $PORT --workers 2

Recipe: Background Agent Worker

For fire-and-forget agent tasks, run a worker service alongside your API service in the same Railway project.

# worker.py — separate Railway service, same project
import os, time, redis
from agent import run_research_agent
 
r = redis.from_url(os.environ['REDIS_URL'])
 
def process_jobs():
    print('Worker started, waiting for jobs...')
    while True:
        # Block until a job arrives (or 5s timeout)
        job = r.brpop('agent_jobs', timeout=5)
        if job:
            _, payload = job
            result = run_research_agent(payload.decode())
            r.set(f'result:{payload}', result, ex=3600)
 
if __name__ == '__main__':
    process_jobs()

Scaling on Railway

Railway supports both vertical scaling (more CPU/RAM per instance) and horizontal scaling (multiple replicas). For AI backends, vertical scaling is usually correct — more RAM means more in-memory context and faster embedding operations.

# railway.toml
[deploy]
startCommand = "uvicorn main:app --host 0.0.0.0 --port $PORT"
numReplicas = 2  # horizontal: 2 instances behind Railway's load balancer
 
[resources]
cpu = 2      # 2 vCPUs
memory = 4096  # 4 GB RAM

Private Networking Between Services

Services in the same Railway project communicate via private hostnames — no public internet, no authentication needed, zero latency overhead.

# Your API service talks to a Redis service via private network
REDIS_URL = os.environ['REDIS_URL']
# Railway sets this to: redis://redis.railway.internal:6379
# Traffic never leaves Railway's private network
 
# Same for Postgres:
DATABASE_URL = os.environ['DATABASE_URL']
# railway.internal hostnames work only within the same project

Metadata	Value
Title	Railway for AI Backends: Running Long-Running Agent Processes
Tool	Railway
Primary SEO keyword	railway ai agent backend
Secondary keywords	railway fastapi, railway long running process, railway websocket, railway persistent server
Estimated read time	7 minutes
Research date	2026-04-14