LangFuse 101: Finally Understand What Your LLM App Is Actually Doing

Your LLM app is in production. Users report that responses are 'off' sometimes. You have no idea which calls are failing, what the actual prompts look like after templating, whether the retrieval step is returning useful context, or how much each user journey is costing you.

Standard application logging does not help. You can log inputs and outputs, but you cannot see latency broken down by step, token costs per trace, or which version of your prompt was active when a specific response was generated.

LangFuse is an open-source LLM observability platform that fills this gap. It gives you traces (end-to-end request journeys), spans (individual steps within a trace), prompt management with version tracking, evaluations, and cost dashboards.

Setup: Python SDK

pip install langfuse

import os
os.environ["LANGFUSE_PUBLIC_KEY"] = "pk-lf-..."
os.environ["LANGFUSE_SECRET_KEY"] = "sk-lf-..."
os.environ["LANGFUSE_HOST"] = "https://cloud.langfuse.com"  # or your self-hosted URL

LangFuse is available as cloud.langfuse.com (free tier available) or self-hosted via Docker. The self-hosted option requires a Postgres database and is straightforward to run.

Your First Trace: Decorating a Function

The simplest way to add tracing is the @observe decorator:

from langfuse.decorators import observe, langfuse_context
from anthropic import Anthropic
 
client = Anthropic()
 
@observe()  # creates a trace for each call to this function
def answer_question(question: str) -> str:
    response = client.messages.create(
        model="claude-haiku-4-5-20251001",
        max_tokens=512,
        messages=[{"role": "user", "content": question}]
    )
    return response.content[0].text
 
result = answer_question("What is the capital of France?")
# LangFuse automatically captures: input, output, latency, model, token counts

After calling this function, open your LangFuse dashboard and you will see a trace with the input question, the model response, latency, and token usage — with no additional instrumentation code.

Tracing Multi-Step Pipelines

The real power appears when your application has multiple steps — retrieval, reranking, generation. Each step becomes a span within the parent trace:

from langfuse.decorators import observe
from anthropic import Anthropic
 
client = Anthropic()
 
@observe()  # child span: retrieval step
def retrieve_context(query: str) -> list[str]:
    # simulate vector search
    return ["Paris is the capital of France.", "France is in Western Europe."]
 
@observe()  # child span: generation step
def generate_answer(question: str, context: list[str]) -> str:
    context_str = "\n".join(context)
    response = client.messages.create(
        model="claude-haiku-4-5-20251001",
        max_tokens=512,
        system=f"Answer using only this context:\n{context_str}",
        messages=[{"role": "user", "content": question}]
    )
    return response.content[0].text
 
@observe()  # parent trace: full pipeline
def rag_pipeline(question: str) -> str:
    context = retrieve_context(question)
    answer = generate_answer(question, context)
    return answer
 
result = rag_pipeline("What is the capital of France?")

In LangFuse, you will see a tree: rag_pipeline at the top, with retrieve_context and generate_answer as child spans. Each shows its own latency, inputs, and outputs. You can immediately see if retrieval is slow, if context is relevant, and how much the generation step costs.

LangChain Integration

If you use LangChain, LangFuse integrates via a callback handler:

from langfuse.callback import CallbackHandler
from langchain_anthropic import ChatAnthropic
from langchain_core.prompts import ChatPromptTemplate
 
langfuse_handler = CallbackHandler()
 
llm = ChatAnthropic(model="claude-haiku-4-5-20251001")
prompt = ChatPromptTemplate.from_template("Answer this question: {question}")
chain = prompt | llm
 
# Pass the handler at invocation time
response = chain.invoke(
    {"question": "What is LangFuse?"},
    config={"callbacks": [langfuse_handler]}
)

Every step in the LangChain chain (prompt formatting, LLM call, output parsing) appears as a separate span in LangFuse automatically.

Adding Metadata and User IDs

Traces are more useful when you can filter by user or session:

from langfuse.decorators import observe, langfuse_context
 
@observe()
def handle_user_request(user_id: str, message: str) -> str:
    # Tag this trace with user and session info
    langfuse_context.update_current_trace(
        user_id=user_id,
        session_id=f"session-{user_id}-today",
        tags=["production", "chat"],
        metadata={"plan": "pro", "feature": "chat"}
    )
 
    # ... rest of your logic ...
    return "response"

You can then filter your LangFuse dashboard by user_id to see all traces for a specific user — invaluable when debugging a user complaint.

Prompt Management

LangFuse includes a prompt registry — version-controlled prompt templates stored in the platform, not in your code.

from langfuse import Langfuse
 
langfuse = Langfuse()
 
# Fetch a prompt by name (fetches the 'production' labelled version)
prompt = langfuse.get_prompt("customer-support-system")
 
# Compile it with variables
compiled = prompt.compile(customer_name="Alice", issue="login failure")
 
# The trace automatically records which prompt version was used

When a bad response is traced back to a specific prompt version, you can immediately see which version was active and compare it against previous versions in the dashboard — without digging through git history.

Cost Tracking Dashboard

LangFuse tracks token usage and cost per trace automatically for all major providers. In the dashboard you can see:

Total cost by day/week/month
Cost per user (filter by user_id)
Cost per model (compare GPT-4o vs Claude vs Gemini in the same product)
Cost per feature (filter by tags)
p50/p95/p99 latency broken down by span type

This data is captured from the API responses — you do not need to configure pricing tables manually for OpenAI and Anthropic; LangFuse maintains them.

Self-Hosting LangFuse

For teams with data residency requirements, LangFuse is straightforward to self-host:

# docker-compose.yml (minimal)
services:
  langfuse-server:
    image: langfuse/langfuse:latest
    environment:
      DATABASE_URL: postgresql://langfuse:secret@db:5432/langfuse
      NEXTAUTH_SECRET: your-secret-here
      SALT: your-salt-here
      NEXTAUTH_URL: http://localhost:3000
    ports:
      - "3000:3000"
    depends_on:
      - db
 
  db:
    image: postgres:15
    environment:
      POSTGRES_USER: langfuse
      POSTGRES_PASSWORD: secret
      POSTGRES_DB: langfuse

The self-hosted version requires the NEXTAUTH_SECRET and SALT values to be kept secret and consistent across restarts. Generate them with: openssl rand -base64 32

Summary

LangFuse gives you the visibility that LLM applications need in production: per-step latency, token costs, prompt versions, and user-level filtering. The @observe decorator requires almost no code change to add to an existing application. Start with tracing your main pipeline function, then progressively instrument individual steps as you find blind spots.