Your n8n RAG Chatbot Works in Demo. Here's What Changes in Production.

The Simple Vector Store is fine for prototypes. This guide covers what you actually need: persistent stores, dynamic updates, and hybrid search.

The Demo vs Production Gap

Every n8n RAG tutorial uses the Simple Vector Store node. It works beautifully in a demo: upload a document, ask a question, get an answer. Then you deploy it and discover the problem: the Simple Vector Store is in-memory only. Every time n8n restarts -- after a deployment, an update, a crash -- your vector store is wiped and your chatbot knows nothing.

That's the demo-to-production gap. This article covers the four changes you need to make before RAG is production-ready in n8n.

The Simple Vector Store node stores vectors in RAM. It survives a workflow execution but NOT an n8n restart. Never use it as your primary store in production.

Change 1: Switch to a Persistent Vector Store

n8n supports several external vector databases natively. Pick one based on your existing stack.

Vector Store	Best for	n8n Node
Supabase (pgvector)	Teams already on Supabase/PostgreSQL	Supabase Vector Store
Pinecone	Managed cloud, no infra to run	Pinecone Vector Store
Qdrant	Self-hosted, open-source, high performance	Qdrant Vector Store
Weaviate	Rich metadata filtering needs	Weaviate Vector Store
PGVector (raw Postgres)	Already running Postgres, want to keep it simple	PG Vector Store

The setup is the same for all of them:

Add the credential for your vector store in n8n settings.
Replace the Simple Vector Store node with your chosen persistent store node.
Set the collection/index name -- this is the namespace for your documents.
Set the same embedding model you used during indexing -- mismatched embeddings produce garbage results.

Use a separate collection per knowledge base or document category. Mixing unrelated documents in one collection degrades retrieval quality because similarity scores become meaningless across very different content types.

Change 2: Build a Separate Indexing Workflow

In most demos, indexing and querying happen in the same workflow. In production they should be separate: one workflow that ingests and indexes documents, another that queries them. This lets you re-index documents without touching the query path.

Indexing workflow structure

Trigger: manual, scheduled, or webhook (e.g. trigger when a new file is uploaded to Google Drive)
Load documents: Google Drive, S3, local files, URLs, or a database query
Split documents: use the Recursive Character Text Splitter (chunk size 500-1000, 100-200 overlap)
Embed: connect an embedding model node (OpenAI, Cohere, or local via Ollama)
Store: write to your persistent vector store with document metadata

// Recommended chunking settings for most use cases
// (in the Recursive Character Text Splitter node)
 
Chunk Size:     800    // characters per chunk
Chunk Overlap:  150    // overlap between chunks (prevents context loss at boundaries)
Separators:     [paragraph, newline, sentence, word]
 
// For technical docs (code-heavy): reduce to 400/80
// For long-form narrative: increase to 1200/200

What metadata to store

Always include metadata alongside your embeddings. You'll use it for filtering and for showing citations in responses.

// Add a Set node before the vector store to attach metadata
{
  "source": "{{ $('Load File').item.json.filename }}",
  "category": "product-docs",
  "updated_at": "{{ $now.toISO() }}",
  "doc_id": "{{ $('Load File').item.json.id }}"
}

Change 3: Handle Document Updates

The biggest operational gap in n8n RAG docs: what happens when a document changes? By default, re-running your indexing workflow just appends new chunks -- it doesn't delete the old ones. You end up with duplicate or stale chunks in your vector store, which causes the agent to retrieve outdated information alongside current information.

The delete-and-reindex pattern

Before indexing a document, delete its existing chunks from the vector store by filtering on doc_id or filename metadata.
Then insert the fresh chunks.

// In a Code node before the vector store write step
// Delete existing chunks for this document first
// (exact implementation depends on your vector store's API)
 
// For Qdrant:
const response = await $http.post(
  `${qdrant_url}/collections/${collection}/points/delete`,
  {
    filter: {
      must: [{ key: "metadata.doc_id", match: { value: $input.item.json.doc_id } }]
    }
  }
);
return $input.item;

Not all n8n vector store nodes expose a native 'delete by metadata' operation. For Pinecone and Qdrant, use an HTTP Request node to call the vector store API directly to delete before re-indexing.

Change 4: Use Metadata Filtering in Queries

Once you have metadata attached to your chunks, you can restrict retrieval to relevant subsets -- instead of searching your entire knowledge base, you search only the chunks that match a filter. This dramatically improves precision for multi-category knowledge bases.

// In the Vector Store Tool node connected to your AI Agent,
// pass a filter expression to restrict search scope
 
// Example: only retrieve from the 'product-docs' category
// (available in Pinecone, Qdrant, Weaviate -- check your store's node options)
 
Metadata Filter: { "category": "product-docs" }
 
// Or dynamically, based on user input routed by the agent:
Metadata Filter: { "category": "{{ $json.detected_category }}" }

Production RAG Checklist

Simple Vector Store replaced with persistent store (Pinecone, Qdrant, Supabase pgvector)
Indexing and querying are separate workflows
Chunking settings tuned for your document type
Metadata (source, doc_id, category, updated_at) stored with every chunk
Delete-before-reindex pattern implemented for document updates
Metadata filtering enabled on queries for multi-category knowledge bases
Embedding model is identical in both indexing and querying workflows
Periodic re-indexing job scheduled for time-sensitive content