The Simple Vector Store is fine for prototypes. This guide covers what you actually need: persistent stores, dynamic updates, and hybrid search.
The Demo vs Production Gap
Every n8n RAG tutorial uses the Simple Vector Store node. It works beautifully in a demo: upload a document, ask a question, get an answer. Then you deploy it and discover the problem: the Simple Vector Store is in-memory only. Every time n8n restarts -- after a deployment, an update, a crash -- your vector store is wiped and your chatbot knows nothing.
That's the demo-to-production gap. This article covers the four changes you need to make before RAG is production-ready in n8n.
The Simple Vector Store node stores vectors in RAM. It survives a workflow execution but NOT an n8n restart. Never use it as your primary store in production.Change 1: Switch to a Persistent Vector Store
n8n supports several external vector databases natively. Pick one based on your existing stack.
| Vector Store | Best for | n8n Node |
|---|---|---|
| Supabase (pgvector) | Teams already on Supabase/PostgreSQL | Supabase Vector Store |
| Pinecone | Managed cloud, no infra to run | Pinecone Vector Store |
| Qdrant | Self-hosted, open-source, high performance | Qdrant Vector Store |
| Weaviate | Rich metadata filtering needs | Weaviate Vector Store |
| PGVector (raw Postgres) | Already running Postgres, want to keep it simple | PG Vector Store |
The setup is the same for all of them:
- Add the credential for your vector store in n8n settings.
- Replace the Simple Vector Store node with your chosen persistent store node.
- Set the collection/index name -- this is the namespace for your documents.
- Set the same embedding model you used during indexing -- mismatched embeddings produce garbage results.
Use a separate collection per knowledge base or document category. Mixing unrelated documents in one collection degrades retrieval quality because similarity scores become meaningless across very different content types.Change 2: Build a Separate Indexing Workflow
In most demos, indexing and querying happen in the same workflow. In production they should be separate: one workflow that ingests and indexes documents, another that queries them. This lets you re-index documents without touching the query path.
Indexing workflow structure
- Trigger: manual, scheduled, or webhook (e.g. trigger when a new file is uploaded to Google Drive)
- Load documents: Google Drive, S3, local files, URLs, or a database query
- Split documents: use the Recursive Character Text Splitter (chunk size 500-1000, 100-200 overlap)
- Embed: connect an embedding model node (OpenAI, Cohere, or local via Ollama)
- Store: write to your persistent vector store with document metadata
// Recommended chunking settings for most use cases
// (in the Recursive Character Text Splitter node)
Chunk Size: 800 // characters per chunk
Chunk Overlap: 150 // overlap between chunks (prevents context loss at boundaries)
Separators: [paragraph, newline, sentence, word]
// For technical docs (code-heavy): reduce to 400/80
// For long-form narrative: increase to 1200/200What metadata to store
Always include metadata alongside your embeddings. You'll use it for filtering and for showing citations in responses.
// Add a Set node before the vector store to attach metadata
{
"source": "{{ $('Load File').item.json.filename }}",
"category": "product-docs",
"updated_at": "{{ $now.toISO() }}",
"doc_id": "{{ $('Load File').item.json.id }}"
}Change 3: Handle Document Updates
The biggest operational gap in n8n RAG docs: what happens when a document changes? By default, re-running your indexing workflow just appends new chunks -- it doesn't delete the old ones. You end up with duplicate or stale chunks in your vector store, which causes the agent to retrieve outdated information alongside current information.
The delete-and-reindex pattern
- Before indexing a document, delete its existing chunks from the vector store by filtering on doc_id or filename metadata.
- Then insert the fresh chunks.
// In a Code node before the vector store write step
// Delete existing chunks for this document first
// (exact implementation depends on your vector store's API)
// For Qdrant:
const response = await $http.post(
`${qdrant_url}/collections/${collection}/points/delete`,
{
filter: {
must: [{ key: "metadata.doc_id", match: { value: $input.item.json.doc_id } }]
}
}
);
return $input.item;Not all n8n vector store nodes expose a native 'delete by metadata' operation. For Pinecone and Qdrant, use an HTTP Request node to call the vector store API directly to delete before re-indexing.Change 4: Use Metadata Filtering in Queries
Once you have metadata attached to your chunks, you can restrict retrieval to relevant subsets -- instead of searching your entire knowledge base, you search only the chunks that match a filter. This dramatically improves precision for multi-category knowledge bases.
// In the Vector Store Tool node connected to your AI Agent,
// pass a filter expression to restrict search scope
// Example: only retrieve from the 'product-docs' category
// (available in Pinecone, Qdrant, Weaviate -- check your store's node options)
Metadata Filter: { "category": "product-docs" }
// Or dynamically, based on user input routed by the agent:
Metadata Filter: { "category": "{{ $json.detected_category }}" }Production RAG Checklist
- Simple Vector Store replaced with persistent store (Pinecone, Qdrant, Supabase pgvector)
- Indexing and querying are separate workflows
- Chunking settings tuned for your document type
- Metadata (source, doc_id, category, updated_at) stored with every chunk
- Delete-before-reindex pattern implemented for document updates
- Metadata filtering enabled on queries for multi-category knowledge bases
- Embedding model is identical in both indexing and querying workflows
- Periodic re-indexing job scheduled for time-sensitive content