The Stateless Agent Problem

Every time a user starts a new conversation with your AI agent, it has forgotten everything. Their name, their preferences, the problem they were trying to solve last Tuesday, the fact that they told the agent they are a vegetarian, or that they run a team of five engineers.

The standard workaround is to stuff the entire conversation history into the context window. This works for a single session. It does not work across sessions, and it does not scale — context windows have limits, and sending thousands of tokens of history on every call gets expensive quickly.

Mem0 is an open-source memory layer designed to solve this. It extracts, stores, and retrieves the facts that matter from conversations, so your agent can have persistent, selective memory without blowing up your context window.

How Mem0 Works

Mem0 sits between your agent and its memory store. When you add messages to Mem0, it uses an LLM to extract the meaningful facts — names, preferences, goals, constraints — and stores them as structured memories in a vector database. When you query for relevant memories, it does semantic search over those facts and returns only what is relevant to the current context.

The result is that instead of sending 10,000 tokens of raw history on every call, you send 200-500 tokens of filtered, relevant memories.

Approach Context size Across sessions Personalisation
No memory Small No None
Raw history in context Large (grows unbounded) Only if stored Weak (noise drowns signal)
Mem0 memory layer Small (filtered) Yes Strong (extracted facts)

The Three Types of Memory

Mem0 organises memory across three scopes:

User Memory

Facts about a specific user that persist across all their conversations. Name, role, preferences, constraints, goals. This is the most powerful type for personalisation.

Example: 'Prefers concise answers', 'Works in Python, not familiar with JavaScript', 'Is the head of a 5-person data team', 'Allergic to nuts'.

Session Memory

Facts relevant to a specific conversation session. Temporary context that should not pollute the user's long-term profile.

Example: 'In this session, we are debugging a FastAPI rate limiting issue', 'User is in a hurry, skip explanations'.

Agent Memory

Facts that the agent itself learns over time — independent of any specific user. Patterns, preferences, strategies that improve the agent's behaviour globally.

Example: 'Users frequently misunderstand the difference between async and await', 'Summarisation requests usually need bullet format, not prose'.

Installation and Setup

pip install mem0ai
 
from mem0 import Memory
 
# Minimal setup — uses local vector store and OpenAI by default
m = Memory()
 
# Custom config — use Anthropic + Qdrant
config = {
    "llm": {
        "provider": "anthropic",
        "config": {
            "model": "claude-haiku-4-5-20251001",
            "api_key": "your-anthropic-key"
        }
    },
    "vector_store": {
        "provider": "qdrant",
        "config": {
            "host": "localhost",
            "port": 6333
        }
    }
}
 
m = Memory.from_config(config)
 
For production, always specify a persistent vector store (Qdrant, Pinecone, or pgvector). The default in-memory store loses all data on restart.

Adding and Retrieving Memories

from mem0 import Memory
m = Memory()
 
# Add a conversation — Mem0 extracts facts automatically
messages = [
    {"role": "user", "content": "I'm building a FastAPI app with PostgreSQL. I prefer async code and I hate ORM magic."},
    {"role": "assistant", "content": "Got it. I'll keep examples async and use raw SQL where possible."}
]
 
m.add(messages, user_id="user-42")
# Mem0 extracts and stores:
# - 'Uses FastAPI'
# - 'Uses PostgreSQL'
# - 'Prefers async code'
# - 'Dislikes ORM abstractions'
 
# Later, in a new session, retrieve relevant memories
relevant = m.search(query="database query patterns", user_id="user-42")
for mem in relevant:
    print(mem["memory"])  # 'Uses PostgreSQL', 'Dislikes ORM abstractions'
 

Using Memories in Agent Prompts

from mem0 import Memory
from anthropic import Anthropic
 
m = Memory()
client = Anthropic()
 
def chat_with_memory(user_message: str, user_id: str) -> str:
    # 1. Retrieve relevant memories
    memories = m.search(query=user_message, user_id=user_id, limit=5)
    memory_context = "\n".join([f"- {mem['memory']}" for mem in memories])
 
    # 2. Build system prompt with memory context
    system = (
        "You are a helpful coding assistant.\n\n"
        "What you know about this user:\n"
        f"{memory_context}"
    )
 
    # 3. Call the LLM
    response = client.messages.create(
        model="claude-sonnet-4-6",
        max_tokens=1024,
        system=system,
        messages=[{"role": "user", "content": user_message}]
    )
    answer = response.content[0].text
 
    # 4. Store this exchange as new memory
    m.add(
        [{"role": "user", "content": user_message},
         {"role": "assistant", "content": answer}],
        user_id=user_id
    )
 
    return answer
 

Memory Management

# Get all memories for a user
all_memories = m.get_all(user_id="user-42")
 
# Update a specific memory
m.update(memory_id="mem-abc123", data="Switched from PostgreSQL to Supabase")
 
# Delete a specific memory
m.delete(memory_id="mem-abc123")
 
# Delete all memories for a user (GDPR deletion)
m.delete_all(user_id="user-42")
 
Implement a GDPR deletion endpoint in your API that calls delete_all for the user's ID. Mem0 makes compliance straightforward — there is one call to delete everything associated with a user.

When You Don't Need Mem0

Mem0 adds complexity — a vector store, an LLM call for extraction, and retrieval latency. It is not always the right choice:

  • Short-lived sessions where history fits in the context window — just pass the full history.
  • Task-oriented agents with no user concept — if every run is independent and impersonal, there is nothing to remember.
  • When you have simple key/value facts to store — a regular database with a users table beats a vector search for structured preferences.
  • When extraction accuracy is critical — Mem0 uses an LLM to extract facts, which means occasional missed or misattributed memories. For high-stakes data, structured extraction with explicit schemas is more reliable.

Summary

Mem0 fills a genuine gap in agent stacks: persistent, selective, cross-session memory that does not require engineers to design extraction and retrieval pipelines from scratch. The add/search/delete API is simple. The real work is choosing a production vector store, designing what memory types you need, and deciding which facts are worth extracting versus passing as raw context.