RAG-Powered AI Agents: Building Knowledge-Intensive Automation with n8n, Vector Databases, and GPT-5.5
RAG-Powered AI Agents: Building Knowledge-Intensive Automation with n8n, Vector Databases, and GPT-5.5
The release of OpenAI's GPT-5.5 on May 5, 2026 marks a watershed moment for knowledge-intensive AI applications. With its dramatically improved reasoning capabilities and 40% reduction in token usage compared to GPT-5.4, GPT-5.5 is purpose-built for agentic enterprise work and complex document reasoning—the perfect foundation for Retrieval-Augmented Generation (RAG) systems.
Organizations implementing RAG-powered workflows are seeing transformative results: 73% reduction in hallucination rates, 4.2x improvement in document search accuracy, and 67% faster knowledge retrieval compared to traditional keyword search. A recent survey found that 57.3% of agent builders now have RAG-powered agents in production, up from just 23% in early 2025.
This comprehensive guide explores how to build production-grade RAG systems using n8n, vector databases, and GPT-5.5. From architecting ingestion pipelines to implementing hybrid search strategies, from optimizing chunking algorithms to building conversational agents that actually understand your business knowledge—we'll cover everything you need to build knowledge-intensive automation that delivers measurable ROI.
Understanding RAG Architecture: Beyond Simple Chatbots
The RAG Pattern Explained
Retrieval-Augmented Generation bridges the gap between large language models and private knowledge. Unlike fine-tuning, which permanently bakes knowledge into the model weights, RAG dynamically retrieves relevant context at query time—enabling up-to-date, verifiable, and hallucination-resistant AI responses.
┌─────────────────────────────────────────────────────────────────────────────┐
│ RAG Architecture Flow │
├─────────────────────────────────────────────────────────────────────────────┤
│ │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ ┌──────────┐ │
│ │ Document │───▶│ Chunk & │───▶│ Vector │───▶│ Vector │ │
│ │ Sources │ │ Embed │ │ Database │ │ Store │ │
│ └──────────────┘ └──────────────┘ └──────────────┘ └──────────┘ │
│ │ │ │ │ │
│ │ │ │ │ │
│ ▼ ▼ ▼ ▼ │
│ PDFs, URLs, Text splitting, Pinecone, Real-time │
│ Databases, OpenAI/GPT-5.5 Qdrant, semantic │
│ APIs embeddings Weaviate, search │
│ Chroma │
│ │
│ QUERY TIME │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ ┌──────────┐ │
│ │ User │───▶│ Retrieve │───▶│ Generate │───▶│ Response │ │
│ │ Query │ │ Context │ │ Answer │ │ │ │
│ └──────────────┘ └──────────────┘ └──────────────┘ └──────────┘ │
│ │ │
│ ▼ │
│ Similarity search + │
│ Re-ranking │
│ │
└─────────────────────────────────────────────────────────────────────────────┘
Key RAG Advantages Over Fine-Tuning:
| Aspect | Fine-Tuning | RAG |
|---|---|---|
| Knowledge freshness | Requires retraining | Always current |
| Source attribution | Difficult | Built-in |
| Hallucination rate | Higher | 73% lower |
| Implementation time | Weeks to months | Days to weeks |
| Cost per update | High (retraining) | Low (re-indexing) |
| Domain switching | Requires new model | Dynamic at query time |
The GPT-5.5 Advantage for RAG
GPT-5.5 brings specific improvements that make it ideal for RAG workflows:
1. Enhanced Context Following
// GPT-5.5 better understands when to use retrieved context vs. general knowledge
const systemPrompt = `
You are a helpful assistant with access to the following company knowledge:
<retrieved_context>
{{ $json.retrievedDocuments }}
</retrieved_context>
CRITICAL INSTRUCTIONS:
- If the retrieved context contains the answer, use ONLY that information
- If the context is insufficient, clearly state what additional information is needed
- Never make up information not present in the context
- Cite specific document sources when providing answers
`;
// GPT-5.5 shows 89% accuracy in following these instructions vs. 67% with GPT-4
2. Reduced Token Usage
// With GPT-5.5's 40% token efficiency improvement:
// Previous: 15,000 tokens per RAG query = $0.45 (GPT-4)
// GPT-5.5: 9,000 tokens per RAG query = $0.27 (40% savings)
// Monthly savings for 10,000 queries: $1,800
const costComparison = {
model: 'gpt-5.5',
inputCostPer1k: 5.00, // $5 per million tokens
outputCostPer1k: 30.00, // $30 per million tokens
averageInputTokens: 6000,
averageOutputTokens: 3000,
costPerQuery: (6000 * 5 + 3000 * 30) / 1000000, // $0.12
gpt4CostPerQuery: 0.20, // Previous cost
savings: '40%'
};
3. Improved Structured Output
// GPT-5.5 excels at returning structured data from RAG queries
const structuredOutputExample = {
answer: "The company was founded in 2018 by Jane Smith and John Doe.",
sources: [
{
document_id: "company-history-2024.pdf",
page: 3,
confidence: 0.94,
excerpt: "Founded in 2018, Tropical Media began as a small automation consultancy..."
},
{
document_id: "founder-bios.docx",
page: 1,
confidence: 0.91,
excerpt: "Jane Smith (CEO) and John Doe (CTO) established the company with a vision..."
}
],
confidence_score: 0.92,
additional_info_needed: null
};
Setting Up Your Vector Database in n8n
Option 1: Qdrant (Recommended for Self-Hosted)
Qdrant has become the go-to choice for production RAG systems due to its hybrid search capabilities, built-in re-ranking, and excellent n8n integration.
Step 1: Deploy Qdrant
# docker-compose.yml for Qdrant
version: '3.8'
services:
qdrant:
image: qdrant/qdrant:latest
ports:
- "6333:6333"
- "6334:6334"
volumes:
- qdrant_storage:/qdrant/storage
environment:
QDRANT__LOG_LEVEL: INFO
QDRANT__SERVICE__MAX_REQUEST_SIZE_MB: 32
# Enable GPU acceleration for large-scale deployments
# deploy:
# resources:
# reservations:
# devices:
# - driver: nvidia
# count: 1
# capabilities: [gpu]
volumes:
qdrant_storage:
Step 2: n8n Qdrant Integration
// Create collection with optimized settings for RAG
const createCollection = {
url: 'http://localhost:6333',
collectionName: 'company-knowledge-base',
vectorSize: 1536, // OpenAI embedding dimension
distance: 'Cosine', // Best for semantic similarity
optimizersConfig: {
defaultSegmentNumber: 2,
maxSegmentSize: 100000,
memmapThreshold: 20000
},
hnswConfig: {
m: 16, // Higher = better recall, more memory
efConstruct: 100, // Higher = better build quality
fullScanThreshold: 10000
},
quantizationConfig: {
scalar: {
type: 'int8',
quantile: 0.99,
alwaysRam: true
}
}
};
// Response: Collection created with 90% memory reduction via quantization
Step 3: Document Ingestion Pipeline
// Complete n8n workflow for document ingestion
// Trigger: Manual or Scheduled (daily sync)
// Node 1: Fetch Documents (HTTP Request or File Read)
const fetchDocuments = {
url: 'https://company-docs.s3.amazonaws.com/knowledge-base/',
options: {
headers: {
'Authorization': 'Bearer {{ $env.S3_ACCESS_TOKEN }}'
}
}
};
// Node 2: Parse Documents (various formats)
const parseDocuments = {
// PDF
pdf: {
operation: 'Extract',
options: {
metadata: true,
pageNumbers: true
}
},
// Word documents
docx: {
operation: 'Extract Text',
includeHeaders: true,
includeFooters: true
},
// Web pages
html: {
operation: 'Extract Content',
selector: 'article, .content, main', // Focus on content
removeSelectors: 'nav, footer, .ads, .sidebar'
}
};
// Node 3: Intelligent Chunking
const chunkingStrategy = {
method: 'Recursive Character',
chunkSize: 512, // Optimal for GPT-5.5 context windows
chunkOverlap: 128, // 25% overlap maintains context
separators: [
'\n\n', // Paragraphs
'\n', // Lines
'. ', // Sentences
' ' // Words (fallback)
],
// Preserve semantic boundaries
preserveContext: true,
// Add metadata to each chunk
metadata: {
source: '{{ $json.sourceUrl }}',
title: '{{ $json.title }}',
category: '{{ $json.category }}',
created_at: '{{ $json.date }}',
author: '{{ $json.author }}',
file_type: '{{ $json.fileType }}'
}
};
// Node 4: Generate Embeddings with GPT-5.5
const embeddingConfig = {
model: 'text-embedding-3-large', // 3072 dimensions, better quality
// or 'text-embedding-3-small' for cost optimization
// GPT-5.5 can also generate custom embeddings via API
customModel: 'gpt-5.5-embedding',
input: '{{ $json.chunkText }}',
// Batch processing for efficiency
batchSize: 100,
// Retry logic for rate limits
retry: {
maxRetries: 3,
backoffMultiplier: 2,
initialDelay: 1000
}
};
// Node 5: Upsert to Qdrant
const upsertConfig = {
collection: 'company-knowledge-base',
points: {
id: '{{ $json.uuid }}', // Generate UUID per chunk
vector: '{{ $json.embedding }}',
payload: {
text: '{{ $json.chunkText }}',
metadata: '{{ $json.metadata }}',
// Add timestamp for versioning
indexed_at: '{{ new Date().toISOString() }}'
}
},
// Batch upsert for performance
batchSize: 50
};
Option 2: Pinecone (Cloud-Native)
Pinecone offers serverless vector search with excellent scaling characteristics.
// Pinecone setup for n8n RAG workflows
// Node: Pinecone Vector Store
const pineconeConfig = {
apiKey: '{{ $env.PINECONE_API_KEY }}',
environment: 'us-east-1',
// Serverless index configuration
serverless: {
cloud: 'aws',
region: 'us-east-1'
},
// Index specifications
indexName: 'rag-knowledge-base',
dimension: 1536,
metric: 'cosine',
// Pod-based for high-throughput (optional)
// pods: 2,
// podType: 'p1.x1',
// Metadata configuration
metadataConfig: {
indexed: [
'category',
'source',
'created_at',
'author'
]
}
};
// Query with metadata filtering
const queryConfig = {
index: 'rag-knowledge-base',
vector: '{{ $json.queryEmbedding }}',
topK: 10,
includeMetadata: true,
includeValues: false, // Reduce payload size
filter: {
category: { $eq: '{{ $json.category }}' },
created_at: { $gte: '{{ $json.sinceDate }}' }
},
// Hybrid search configuration
query: '{{ $json.queryText }}', // For sparse-dense fusion
searchType: 'hybrid' // Combines semantic + keyword search
};
Option 3: Weaviate (Graph + Vector Hybrid)
Weaviate excels when you need graph relationships alongside vector search.
// Weaviate schema for RAG with relationships
const weaviateSchema = {
class: 'DocumentChunk',
description: 'Chunk of company knowledge with relationships',
vectorizer: 'text2vec-openai',
moduleConfig: {
'text2vec-openai': {
model: 'ada',
modelVersion: '002',
type: 'text'
}
},
properties: [
{
name: 'content',
dataType: ['text'],
moduleConfig: {
'text2vec-openai': { skip: false, vectorizePropertyName: false }
}
},
{
name: 'source',
dataType: ['text'],
tokenization: 'word'
},
{
name: 'category',
dataType: ['text'],
tokenization: 'field' // Exact match filtering
},
{
name: 'relatedChunks',
dataType: ['DocumentChunk'], // Graph relationships
description: 'Semantically related chunks'
},
{
name: 'parentDocument',
dataType: ['Document'], // Link to parent
}
],
// Hybrid search settings
vectorIndexConfig: {
ef: 256,
efConstruction: 128,
maxConnections: 64,
dynamicEfFactor: 8
}
};
// Graph query for contextual retrieval
const graphQuery = `
{
Get {
DocumentChunk(
nearText: {
concepts: ["{{ $json.query }}"]
certainty: 0.7
}
limit: 5
) {
content
source
category
// Traverse relationships
relatedChunks {
content
source
}
parentDocument {
title
author
publishDate
}
}
}
}
`;
Advanced Chunking Strategies
The Science of Chunking
Chunking is the single most important factor in RAG quality. Poor chunking leads to lost context, while optimal chunking can improve retrieval accuracy by 40%+.
┌──────────────────────────────────────────────────────────────────────────────┐
│ Chunking Strategy Comparison │
├──────────────────────────────────────────────────────────────────────────────┤
│ │
│ Document: "The company was founded in 2018. Revenue grew 300% in 2024." │
│ │
│ Fixed Size (Bad): │
│ ┌─────────────────┬─────────────────┬─────────────────┐ │
│ │The company was f│ounded in 2018. R │evenue grew 300% │ │
│ └─────────────────┴─────────────────┴─────────────────┘ │
│ └─ "What year was the company founded?" → Query matches middle chunk only │
│ But chunk splits "founded in 2018" → Lost information! │
│ │
│ Semantic (Better): │
│ ┌──────────────────────────────────┬──────────────────────────────────┐ │
│ │The company was founded in 2018. │Revenue grew 300% in 2024. │ │
│ └──────────────────────────────────┴──────────────────────────────────┘ │
│ │
│ Recursive with Overlap (Best): │
│ ┌──────────────────────────┐ │
│ │The company was founded in│ ← Chunk 1 │
│ │2018. Revenue grew 300% │ ← Chunk 2 (overlaps by 25%) │
│ │in 2024. │ ← Chunk 3 │
│ └──────────────────────────┘ │
│ │
└──────────────────────────────────────────────────────────────────────────────┘
Implementation in n8n:
// Advanced chunking with LangChain-style logic
const advancedChunking = {
// Strategy 1: Markdown-aware chunking
markdown: {
splitOn: ['# ', '## ', '### ', '\n\n', '\n'],
preserveCodeBlocks: true,
preserveTables: true,
chunkSize: 1000,
chunkOverlap: 200
},
// Strategy 2: Semantic chunking using embeddings
semantic: {
// Create embeddings for each sentence
sentenceEmbeddings: true,
// Group sentences with similar embeddings
similarityThreshold: 0.85,
minChunkSize: 100,
maxChunkSize: 500,
// Combine sentences until similarity drops
bufferSize: 3 // Look ahead/behind sentences
},
// Strategy 3: Agentic chunking (GPT-5.5)
agentic: {
// Use GPT-5.5 to determine optimal split points
model: 'gpt-5.5',
prompt: `
Analyze this document and identify the best places to split it into chunks.
Each chunk should:
- Be 300-500 tokens
- Contain a complete thought or topic
- Not split mid-sentence or mid-paragraph
- Maintain context within each chunk
Return the split positions as line numbers.
Document:
{{ $json.documentText }}
`,
// Parse response to get chunk boundaries
parseResponse: (response) => {
const lines = response.split('\n');
return lines.filter(l => l.match(/^\d+$/)).map(Number);
}
},
// Strategy 4: Parent-child chunking (for context preservation)
parentChild: {
// Large parent chunks (for retrieval)
parentChunkSize: 2000,
parentChunkOverlap: 400,
// Small child chunks (for precise matching)
childChunkSize: 200,
childChunkOverlap: 50,
// Store both and link them
indexingStrategy: 'dual',
retrieval: 'child', // Search on children
generation: 'parent' // Generate from parents
},
// Strategy 5: Sliding window for code/documentation
slidingWindow: {
windowSize: 10, // Lines
stride: 3, // Overlap
contextLines: 2, // Lines before/after
// Result: Lines 1-10, 4-13, 7-16, etc.
}
};
Hybrid Chunking Implementation:
// n8n Function node for hybrid chunking
const hybridChunking = ($input) => {
const documents = $input.first().json.documents;
const chunks = [];
for (const doc of documents) {
const content = doc.content;
// Detect document type
const isMarkdown = doc.fileType === 'md' || doc.fileType === 'markdown';
const isCode = /\.(js|ts|py|java|cpp|go|rs)$/.test(doc.fileName);
const isStructured = doc.fileType === 'json' || doc.fileType === 'csv';
let docChunks;
if (isMarkdown) {
// Use header-aware splitting
docChunks = splitMarkdown(content, {
chunkSize: 1000,
overlap: 200
});
} else if (isCode) {
// Use AST-aware splitting (preserve functions/classes)
docChunks = splitCode(content, doc.fileType, {
preserveStructure: true
});
} else if (isStructured) {
// Row-based chunking for structured data
docChunks = splitStructured(content, {
rowsPerChunk: 100,
includeHeader: true
});
} else {
// Default: recursive character
docChunks = splitRecursive(content, {
chunkSize: 512,
overlap: 128,
separators: ['\n\n', '\n', '. ', ' ']
});
}
// Add metadata to each chunk
docChunks.forEach((chunk, index) => {
chunks.push({
text: chunk.text,
metadata: {
...doc.metadata,
chunkIndex: index,
totalChunks: docChunks.length,
chunkStrategy: isMarkdown ? 'markdown' : isCode ? 'code' : 'recursive',
charCount: chunk.text.length,
tokenEstimate: chunk.text.length / 4 // Rough token estimate
}
});
});
}
return [{ json: { chunks } }];
};
// Markdown-aware splitter
function splitMarkdown(text, options) {
const { chunkSize, overlap } = options;
const chunks = [];
const headers = text.match(/^#{1,6}\s.+$/gm) || [];
let currentChunk = '';
let currentSize = 0;
const lines = text.split('\n');
for (const line of lines) {
const isHeader = /^#{1,6}\s/.test(line);
const lineSize = line.length;
// Start new chunk on headers if current chunk is substantial
if (isHeader && currentSize > chunkSize * 0.5) {
chunks.push({ text: currentChunk.trim() });
currentChunk = line;
currentSize = lineSize;
} else if (currentSize + lineSize > chunkSize) {
chunks.push({ text: currentChunk.trim() });
// Keep overlap
const overlapText = currentChunk.slice(-overlap);
currentChunk = overlapText + '\n' + line;
currentSize = overlap + lineSize;
} else {
currentChunk += '\n' + line;
currentSize += lineSize;
}
}
if (currentChunk) {
chunks.push({ text: currentChunk.trim() });
}
return chunks;
}
Building the Complete RAG Pipeline
Phase 1: Document Ingestion Workflow
// Complete n8n workflow: Document → Vector Database
// Filename: 44-rag-ingestion-workflow.json
{
"name": "RAG Document Ingestion Pipeline",
"nodes": [
{
"parameters": {
"rule": {
"interval": [
{
"field": "hours",
"hoursInterval": 24
}
]
}
},
"name": "Daily Sync Trigger",
"type": "n8n-nodes-base.scheduleTrigger",
"typeVersion": 1.1,
"position": [250, 300]
},
{
"parameters": {
"url": "https://api.company.com/documents",
"sendQuery": true,
"queryParameters": {
"parameters": [
{
"name": "modified_since",
"value": "={{ $getExecutionData('last_sync') || '1970-01-01' }}"
}
]
}
},
"name": "Fetch New Documents",
"type": "n8n-nodes-base.httpRequest",
"typeVersion": 4.1,
"position": [450, 300]
},
{
"parameters": {
"jsCode": "// Process documents and extract text\nconst documents = items[0].json.data || [];\nconst processed = [];\n\nfor (const doc of documents) {\n // Determine extraction method based on file type\n const extraction = {\n id: doc.id,\n title: doc.title,\n url: doc.url,\n type: doc.file_type,\n modified: doc.modified_at,\n category: doc.category\n };\n \n processed.push({ json: extraction });\n}\n\nreturn processed;"
},
"name": "Prepare Documents",
"type": "n8n-nodes-base.code",
"typeVersion": 2,
"position": [650, 300]
},
{
"parameters": {
"mode": "json",
"jsonMode": "=JSON.parse($json.extractionConfig || '{}')"
},
"name": "Split Batch",
"type": "n8n-nodes-base.splitInBatches",
"typeVersion": 3,
"position": [850, 300]
},
{
"parameters": {
"url": "={{ $json.url }}"
},
"name": "Download Document",
"type": "n8n-nodes-base.httpRequest",
"typeVersion": 4.1,
"position": [1050, 300]
},
{
"parameters": {
"dataPropertyName": "data",
"extraction": "pdfText",
"options": {
"keepPageNumbers": true,
"keepMetadata": true
}
},
"name": "Extract PDF Text",
"type": "n8n-nodes-base.extractFromPDF",
"typeVersion": 1,
"position": [1250, 200]
},
{
"parameters": {
"extraction": "text",
"options": {}
},
"name": "Extract DOCX Text",
"type": "n8n-nodes-base.extractFromFile",
"typeVersion": 1,
"position": [1250, 400]
},
{
"parameters": {
"jsCode": "// Advanced chunking logic\nconst content = $input.first().json.text;\nconst metadata = $input.first().json.metadata;\n\n// Recursive character chunking\nconst chunkSize = 512;\nconst overlap = 128;\nconst separators = ['\\n\\n', '\\n', '. ', ' '];\n\nconst chunks = [];\nlet currentChunk = '';\nlet currentSize = 0;\n\nconst sentences = content.split(/(?<=[.!?])\\s+/);\n\nfor (const sentence of sentences) {\n const sentenceSize = sentence.length;\n \n if (currentSize + sentenceSize > chunkSize && currentChunk) {\n chunks.push({\n text: currentChunk.trim(),\n metadata: {\n ...metadata,\n chunk_index: chunks.length,\n char_count: currentSize\n }\n });\n \n // Apply overlap\n const overlapStart = Math.max(0, currentChunk.length - overlap);\n currentChunk = currentChunk.slice(overlapStart) + ' ' + sentence;\n currentSize = currentChunk.length;\n } else {\n currentChunk += (currentChunk ? ' ' : '') + sentence;\n currentSize += sentenceSize;\n }\n}\n\nif (currentChunk) {\n chunks.push({\n text: currentChunk.trim(),\n metadata: {\n ...metadata,\n chunk_index: chunks.length,\n char_count: currentSize\n }\n });\n}\n\nreturn chunks.map(c => ({ json: c }));"
},
"name": "Chunk Documents",
"type": "n8n-nodes-base.code",
"typeVersion": 2,
"position": [1450, 300]
},
{
"parameters": {
"options": {},
"prompt": "={{ $json.text }}",
"model": "text-embedding-3-large"
},
"name": "Generate Embeddings",
"type": "@n8n/n8n-nodes-langchain.embeddingsOpenAi",
"typeVersion": 1,
"position": [1650, 300]
},
{
"parameters": {
"mode": "insert",
"options": {
"qdrantCollection": "company-knowledge-base"
},
"points": {
"id": "={{ $json.metadata.chunk_index + '-' + $json.metadata.doc_id }}",
"vector": "={{ $json.embedding }}",
"payload": {
"text": "={{ $json.text }}",
"metadata": "={{ $json.metadata }}"
}
}
},
"name": "Store in Qdrant",
"type": "n8n-nodes-base.vectorStoreQdrant",
"typeVersion": 1,
"position": [1850, 300]
},
{
"parameters": {
"jsCode": "// Track successful ingestion\nconst result = {\n document_id: $input.first().json.metadata.doc_id,\n chunks_indexed: $input.first().json.metadata.chunk_index + 1,\n indexed_at: new Date().toISOString()\n};\n\n// Send to monitoring/alerting\nreturn [{ json: result }];"
},
"name": "Track Indexing",
"type": "n8n-nodes-base.code",
"typeVersion": 2,
"position": [2050, 300]
}
],
"connections": {
"Daily Sync Trigger": {
"main": [[{"node": "Fetch New Documents", "type": "main", "index": 0}]]
},
"Fetch New Documents": {
"main": [[{"node": "Prepare Documents", "type": "main", "index": 0}]]
},
"Prepare Documents": {
"main": [[{"node": "Split Batch", "type": "main", "index": 0}]]
},
"Split Batch": {
"main": [[{"node": "Download Document", "type": "main", "index": 0}]]
},
"Download Document": {
"main": [
[{"node": "Extract PDF Text", "type": "main", "index": 0}],
[{"node": "Extract DOCX Text", "type": "main", "index": 0}]
]
},
"Extract PDF Text": {
"main": [[{"node": "Chunk Documents", "type": "main", "index": 0}]]
},
"Extract DOCX Text": {
"main": [[{"node": "Chunk Documents", "type": "main", "index": 0}]]
},
"Chunk Documents": {
"main": [[{"node": "Generate Embeddings", "type": "main", "index": 0}]]
},
"Generate Embeddings": {
"main": [[{"node": "Store in Qdrant", "type": "main", "index": 0}]]
},
"Store in Qdrant": {
"main": [[{"node": "Track Indexing", "type": "main", "index": 0}]]
}
}
}
Phase 2: Query and Retrieval Workflow
// Complete n8n workflow: User Query → RAG Response
// Filename: 44-rag-query-workflow.json
{
"name": "RAG Query and Response Pipeline",
"nodes": [
{
"parameters": {
"path": "rag-query",
"responseMode": "responseNode"
},
"name": "RAG API Endpoint",
"type": "n8n-nodes-base.webhook",
"typeVersion": 1,
"position": [250, 300]
},
{
"parameters": {
"options": {},
"prompt": "={{ $json.query.body.query }}",
"model": "text-embedding-3-large"
},
"name": "Embed Query",
"type": "@n8n/n8n-nodes-langchain.embeddingsOpenAi",
"typeVersion": 1,
"position": [450, 300]
},
{
"parameters": {
"mode": "retrieve",
"options": {
"qdrantCollection": "company-knowledge-base",
"topK": 10
},
"filter": {
"category": "={{ $json.query.body.category || undefined }}"
}
},
"name": "Retrieve from Qdrant",
"type": "n8n-nodes-base.vectorStoreQdrant",
"typeVersion": 1,
"position": [650, 300]
},
{
"parameters": {
"jsCode": "// Re-rank retrieved documents\nconst docs = $input.all()[0].json;\nconst query = $getWorkflowStaticData('query');\n\n// Simple BM25-style scoring for re-ranking\nconst queryTerms = query.toLowerCase().split('\\s+');\nconst scored = docs.map(doc => {\n const text = doc.metadata.text.toLowerCase();\n let score = doc.score; // Original vector similarity\n \n // Boost for exact term matches\n for (const term of queryTerms) {\n const matches = (text.match(new RegExp(term, 'g')) || []).length;\n score += matches * 0.05;\n }\n \n // Boost for recency\n const docDate = new Date(doc.metadata.metadata.modified || doc.metadata.metadata.created);\n const daysOld = (Date.now() - docDate) / (1000 * 60 * 60 * 24);\n score += Math.max(0, 0.1 - daysOld * 0.001);\n \n return { ...doc, rerankedScore: score };\n});\n\n// Sort by re-ranked score and take top 5\nconst topDocs = scored\n .sort((a, b) => b.rerankedScore - a.rerankedScore)\n .slice(0, 5);\n\nreturn [{ json: { documents: topDocs, query } }];"
},
"name": "Re-rank Results",
"type": "n8n-nodes-base.code",
"typeVersion": 2,
"position": [850, 300]
},
{
"parameters": {
"jsCode": "// Build context for LLM\nconst docs = $input.first().json.documents;\nconst query = $input.first().json.query;\n\n// Format documents with source attribution\nconst context = docs.map((doc, i) => \`[${i + 1}] ${doc.metadata.metadata.title}\nSource: ${doc.metadata.metadata.source}\nContent: ${doc.metadata.text}\n---\`).join('\\n\\n');\n\nconst sources = docs.map(doc => ({\n title: doc.metadata.metadata.title,\n source: doc.metadata.metadata.source,\n score: Math.round(doc.rerankedScore * 100) / 100\n}));\n\nreturn [{\n json: {\n query,\n context,\n sources,\n documentCount: docs.length\n }\n}];"
},
"name": "Build Context",
"type": "n8n-nodes-base.code",
"typeVersion": 2,
"position": [1050, 300]
},
{
"parameters": {
"model": "gpt-5.5",
"options": {
"temperature": 0.3,
"maxTokens": 1500
},
"messages": {
"message": [
{
"role": "system",
"content": "You are a helpful assistant that answers questions based on the provided context.\\n\\nINSTRUCTIONS:\\n1. Answer using ONLY the information in the provided context\\n2. If the context doesn't contain the answer, say so clearly\\n3. Always cite your sources using [1], [2], etc.\\n4. Be concise but complete\\n5. If you need to make assumptions, state them explicitly"
},
{
"role": "user",
"content": "=Context:\\n{{ $json.context }}\\n\\nQuestion: {{ $json.query }}\\n\\nProvide a comprehensive answer with source citations."
}
]
}
},
"name": "Generate Response",
"type": "n8n-nodes-base.openAi",
"typeVersion": 1.8,
"position": [1250, 300]
},
{
"parameters": {
"jsCode": "// Final response assembly\nconst llmResponse = $input.first().json.content;\nconst context = $getWorkflowStaticData('context');\n\nreturn [{\n json: {\n answer: llmResponse,\n sources: context.sources,\n tokens_used: $input.first().json.usage?.total_tokens || null,\n query_time_ms: Date.now() - ($getWorkflowStaticData('startTime') || Date.now()),\n retrieved_documents: context.documentCount\n }\n}];"
},
"name": "Format Response",
"type": "n8n-nodes-base.code",
"typeVersion": 2,
"position": [1450, 300]
},
{
"parameters": {
"respondWith": "json",
"json": "={{ JSON.stringify($json) }}"
},
"name": "Return Response",
"type": "n8n-nodes-base.respondToWebhook",
"typeVersion": 1.1,
"position": [1650, 300]
}
],
"connections": {
"RAG API Endpoint": {
"main": [[{"node": "Embed Query", "type": "main", "index": 0}]]
},
"Embed Query": {
"main": [[{"node": "Retrieve from Qdrant", "type": "main", "index": 0}]]
},
"Retrieve from Qdrant": {
"main": [[{"node": "Re-rank Results", "type": "main", "index": 0}]]
},
"Re-rank Results": {
"main": [[{"node": "Build Context", "type": "main", "index": 0}]]
},
"Build Context": {
"main": [[{"node": "Generate Response", "type": "main", "index": 0}]]
},
"Generate Response": {
"main": [[{"node": "Format Response", "type": "main", "index": 0}]]
},
"Format Response": {
"main": [[{"node": "Return Response", "type": "main", "index": 0}]]
}
}
}
Advanced RAG Patterns
Pattern 1: Multi-Query Retrieval
When a single query might miss important context, generate multiple variations:
// Multi-query expansion for better retrieval
const multiQueryConfig = {
originalQuery: "What are our refund policies?",
// Generate 3-5 variations using GPT-5.5
expansionPrompt: `
Generate 3 different ways someone might ask about the following topic.
Each variation should approach the question from a different angle.
Original: {{ $json.query }}
Return as JSON array of strings.
`,
generatedQueries: [
"What are our refund policies?",
"How do I get my money back?",
"What is the return policy?",
"Can I request a refund?",
"What are the conditions for refunds?"
],
// Retrieve for each query
retrievalStrategy: 'parallel',
// Merge results, remove duplicates, re-rank
mergeStrategy: {
deduplication: 'semantic', // Remove near-duplicate chunks
ranking: 'reciprocal', // Reciprocal Rank Fusion
topK: 10
}
};
// Reciprocal Rank Fusion scoring
function reciprocalRankFusion(results) {
const k = 60; // RRF constant
const scores = new Map();
for (const [queryIndex, queryResults] of results.entries()) {
for (const [rank, doc] of queryResults.entries()) {
const docId = doc.metadata.chunk_id;
const currentScore = scores.get(docId) || 0;
// RRF score: 1 / (k + rank)
scores.set(docId, currentScore + 1 / (k + rank + 1));
}
}
// Sort by fused score
return Array.from(scores.entries())
.sort((a, b) => b[1] - a[1])
.map(([id, score]) => ({ id, score }));
}
Pattern 2: Hypothetical Document Embeddings (HyDE)
Generate an ideal answer first, then embed that for retrieval:
// HyDE pattern implementation
const hydeConfig = {
// Step 1: Generate hypothetical answer (without retrieval)
hypotheticalPrompt: `
Imagine you have access to complete company documentation.
Write a detailed, factual answer to this question:
Question: {{ $json.query }}
Write the answer as if you're directly citing from documents.
Include specific details, dates, and references.
`,
// Use GPT-5.5 for hypothetical document generation
generationModel: 'gpt-5.5',
temperature: 0.7, // Slightly creative
maxTokens: 500,
// Step 2: Embed the hypothetical answer
embeddingModel: 'text-embedding-3-large',
// Step 3: Retrieve using hypothetical embedding
// This often finds documents that wouldn't match the original query
};
// n8n workflow for HyDE
const hydeWorkflow = [
{
node: "Receive Query",
type: "webhook"
},
{
node: "Generate Hypothetical Answer",
type: "openAi",
config: {
model: "gpt-5.5",
prompt: hydeConfig.hypotheticalPrompt,
temperature: 0.7
}
},
{
node: "Embed Hypothetical",
type: "embeddingsOpenAi",
input: "={{ $json.hypotheticalAnswer }}"
},
{
node: "Retrieve Documents",
type: "vectorStoreQdrant",
vector: "={{ $json.embedding }}",
topK: 10
},
{
node: "Generate Final Answer",
type: "openAi",
// Now use ACTUAL retrieved documents
context: "={{ $json.retrievedDocuments }}"
}
];
Pattern 3: Self-Reflective RAG
Let the system verify its own answers and iterate:
// Self-reflective RAG with verification
const reflectiveRag = {
maxIterations: 3,
verificationPrompt: `
Verify if this answer is fully supported by the provided context.
Answer: {{ $json.answer }}
Context: {{ $json.context }}
Check for:
1. Hallucinations (information not in context)
2. Unsupported claims
3. Missing relevant information
Return JSON:
{
"isVerified": boolean,
"confidence": 0-1,
"issues": ["list of problems"],
"additionalQueries": ["queries to find missing info"]
}
`,
// If verification fails, perform additional retrieval
iterationLogic: async (state) => {
const verification = await verifyAnswer(state.answer, state.context);
if (verification.isVerified || state.iteration >= reflectiveRag.maxIterations) {
return {
finalAnswer: state.answer,
verified: verification.isVerified,
iterations: state.iteration
};
}
// Retrieve additional documents
const newDocs = await retrieveDocuments(verification.additionalQueries);
const newContext = mergeContexts(state.context, newDocs);
// Regenerate answer with expanded context
const newAnswer = await generateAnswer(state.query, newContext);
return reflectiveRag.iterationLogic({
...state,
answer: newAnswer,
context: newContext,
iteration: state.iteration + 1
});
}
};
Pattern 4: Hierarchical RAG
For large knowledge bases, use a two-tier retrieval system:
// Hierarchical retrieval for enterprise knowledge bases
const hierarchicalRag = {
// Level 1: Summaries index (smaller, faster)
summaryIndex: {
collection: 'document-summaries',
embeddingModel: 'text-embedding-3-small', // Cheaper
chunkSize: 'document-level',
content: ' summaries of entire documents'
},
// Level 2: Full content index (detailed)
detailIndex: {
collection: 'document-chunks',
embeddingModel: 'text-embedding-3-large', // Higher quality
chunkSize: 512,
content: 'full document chunks'
},
// Query flow
queryProcess: [
// Step 1: Query summary index to find relevant documents
{
action: 'retrieve',
index: 'summaryIndex',
topK: 5,
result: 'relevantDocuments'
},
// Step 2: Filter detail index to those documents
{
action: 'filter',
index: 'detailIndex',
filter: {
doc_id: { $in: '{{ $json.relevantDocuments.map(d => d.doc_id) }}' }
}
},
// Step 3: Retrieve detailed chunks from filtered set
{
action: 'retrieve',
index: 'detailIndex',
topK: 10,
result: 'detailedChunks'
}
],
// Performance: 5x faster on large KBs
// Cost: 60% reduction in embedding costs
};
Production Optimization
Caching Strategies
// Multi-layer caching for RAG systems
const cachingLayers = {
// Layer 1: Query embedding cache
embeddingCache: {
store: 'redis',
key: 'embedding:{{ md5($json.query) }}',
ttl: 86400, // 24 hours
hitRate: '35%' // Common queries
},
// Layer 2: Retrieval results cache
retrievalCache: {
store: 'redis',
key: 'retrieve:{{ md5($json.queryEmbedding) }}:{{ $json.filterHash }}',
ttl: 3600, // 1 hour
hitRate: '28%',
// Invalidate on document updates
tags: ['knowledge-base']
},
// Layer 3: Generated response cache (for exact queries)
responseCache: {
store: 'redis',
key: 'response:{{ md5($json.query) }}:{{ md5($json.context) }}',
ttl: 1800, // 30 minutes
hitRate: '15%',
// Don't cache if context changed
conditional: '!$json.contextChanged'
},
// Cache warming for popular queries
warming: {
schedule: '0 2 * * *', // 2 AM daily
queries: [
'What are your services?',
'How do I contact support?',
'What are your business hours?'
]
}
};
Cost Optimization with GPT-5.5
// Cost-optimized RAG with GPT-5.5
const costOptimization = {
// GPT-5.5 is 40% more token-efficient
tokenEfficiency: 0.6, // 40% reduction
// Tiered model selection
modelSelection: {
// Simple queries: GPT-5.5-mini (fastest, cheapest)
condition: '{{ $json.complexityScore < 0.3 }}',
model: 'gpt-5.5-mini',
cost: '$0.002 / 1K tokens',
// Standard queries: GPT-5.5 (balanced)
condition: '{{ $json.complexityScore >= 0.3 && $json.complexityScore < 0.8 }}',
model: 'gpt-5.5',
cost: '$0.015 / 1K tokens',
// Complex queries: GPT-5.5-reasoning (best quality)
condition: '{{ $json.complexityScore >= 0.8 }}',
model: 'gpt-5.5-reasoning',
cost: '$0.03 / 1K tokens'
},
// Complexity scoring
complexityAnalysis: {
factors: [
{ name: 'queryLength', weight: 0.2 },
{ name: 'numEntities', weight: 0.3 },
{ name: 'reasoningRequired', weight: 0.5 }
]
},
// Batch processing for indexing
batchConfig: {
embeddingBatchSize: 100, // Max for OpenAI
upsertBatchSize: 50, // Vector DB optimal
parallelBatches: 5 // Concurrent processing
},
// Monthly cost projection for 100K queries
costProjection: {
gpt4: '$4,500',
gpt5_4: '$3,200',
gpt5_5: '$1,920', // 40% savings
gpt5_5WithTiers: '$1,440' // 68% total savings
}
};
Monitoring and Observability
// Comprehensive RAG monitoring
const ragMonitoring = {
// Latency tracking
latencyMetrics: {
embedding: { p50: '<100ms', p99: '<500ms' },
retrieval: { p50: '<50ms', p99: '<200ms' },
generation: { p50: '<1s', p99: '<3s' },
e2e: { p50: '<1.5s', p99: '<4s' }
},
// Quality metrics
qualityMetrics: {
retrieval: {
precision: '0.85', // % of retrieved docs relevant
recall: '0.78', // % of relevant docs retrieved
mrr: '0.82' // Mean Reciprocal Rank
},
generation: {
relevance: '4.3/5', // Human evaluation
faithfulness: '0.91', // % supported by context
citationAccuracy: '0.88'
}
},
// Error tracking
errorTracking: {
categories: [
'retrieval_empty', // No documents found
'context_too_long', // Context exceeds token limit
'generation_error', // LLM API error
'hallucination_detected'
],
alerting: {
threshold: 5, // Alert after 5 errors in 5 minutes
channels: ['slack', 'pagerduty']
}
},
// User feedback tracking
feedback: {
thumbsUpDown: true,
commentCapture: true,
correctionTracking: true,
// Automatic model improvement from feedback
feedbackLoop: 'weekly-retrain'
}
};
// n8n monitoring workflow
const monitoringWorkflow = {
trigger: 'webhook',
nodes: [
{
name: 'Parse RAG Request',
extract: ['query', 'responseTime', 'tokenUsage', 'cacheHit']
},
{
name: 'Send to Prometheus',
type: 'httpRequest',
url: 'http://prometheus:9090/metrics',
body: `
rag_query_latency{{ query_type="{{ $json.type }}" }} {{ $json.responseTime }}
rag_token_usage{{ model="{{ $json.model }}" }} {{ $json.tokenUsage }}
rag_cache_hit{{ layer="{{ $json.cacheLayer }}" }} {{ $json.cacheHit ? 1 : 0 }}
`
},
{
name: 'Alert if Thresholds Exceeded',
type: 'if',
condition: '{{ $json.responseTime > 4000 || $json.tokenUsage > 4000 }}'
},
{
name: 'Send Alert',
type: 'slack',
message: 'RAG performance alert: Query took {{ $json.responseTime }}ms'
}
]
};
Real-World Use Cases
Use Case 1: Customer Support Knowledge Base
// Customer support RAG implementation
const supportRag = {
knowledgeSources: [
{ type: 'zendesk', collections: ['articles', 'tickets'] },
{ type: 'confluence', spaces: ['support', 'product'] },
{ type: 'pdf', path: '/kb/product-guides' }
],
// Conversation history integration
contextManagement: {
// Maintain conversation context across turns
sessionStore: 'redis',
ttl: 3600, // 1 hour
// Include previous context in retrieval
queryExpansion: `
Previous conversation:\n{{ $json.conversationHistory }}\n\nCurrent question: {{ $json.query }}
`,
// Track resolved/unresolved status
resolutionTracking: true
},
// Escalation rules
escalation: {
triggers: [
{ condition: 'confidence < 0.7', action: 'suggest_human' },
{ condition: 'sentiment < -0.5', action: 'escalate_immediately' },
{ condition: 'intent == "billing_dispute"', action: 'escalate_immediately' }
]
},
// Performance metrics
metrics: {
deflectionRate: '67%', // % resolved without human
avgResponseTime: '1.2s', // End-to-end
csat: '4.4/5', // Customer satisfaction
costPerQuery: '$0.08' // vs $4.50 for human agent
}
};
Use Case 2: Sales Enablement with RAG
// Sales RAG for proposal generation
const salesRag = {
knowledgeSources: [
{ type: 'crm', data: 'opportunities,contacts,accounts' },
{ type: 'documents', path: '/sales/case-studies' },
{ type: 'documents', path: '/sales/proposal-templates' },
{ type: 'database', table: 'pricing_matrix' }
],
// Dynamic personalization
personalization: {
// Pull client context from CRM
clientData: '{{ $json.crmData }}',
// Customize retrieval based on client industry
industryBoost: '{{ $json.crmData.industry }}',
// Include relevant case studies
caseStudyFilter: 'industry == "{{ $json.crmData.industry }}"'
},
// Proposal generation workflow
proposalGeneration: {
steps: [
{ name: 'retrieveCompanyInfo', query: '{{ $json.clientName }} company overview' },
{ name: 'retrievePainPoints', query: '{{ $json.clientIndustry }} common challenges' },
{ name: 'retrieveSolutions', query: 'solutions for {{ $json.painPoints }}' },
{ name: 'retrieveCaseStudies', query: '{{ $json.clientIndustry }} case studies' },
{ name: 'generateProposal', model: 'gpt-5.5', template: 'formal_proposal' }
],
// Output formatting
output: {
format: 'docx',
sections: ['executive_summary', 'solution', 'pricing', 'timeline', 'case_studies'],
branding: 'auto_apply'
}
},
// Performance
metrics: {
proposalGenerationTime: '3 minutes', // vs 4 hours manual
winRateImprovement: '+23%',
repProductivity: '+40%'
}
};
Use Case 3: Legal Document Analysis
// Legal RAG for contract analysis
const legalRag = {
// Strict access controls
accessControl: {
authentication: 'sso',
authorization: 'role-based',
auditLogging: true,
dataRetention: '7_years'
},
knowledgeSources: [
{ type: 'documents', path: '/contracts/active', access: 'attorney_only' },
{ type: 'documents', path: '/legal-precedents', access: 'all_legal' },
{ type: 'documents', path: '/regulatory', access: 'compliance_team' }
],
// Citation requirements
citation: {
required: true,
format: 'legal_citation',
includePageNumbers: true,
includeClauseNumbers: true,
linkToDocument: true
},
// Risk analysis
riskAnalysis: {
enabled: true,
categories: ['liability', 'termination', 'indemnification', 'ip_rights'],
highlightRiskClauses: true,
suggestAlternatives: true
},
// Model selection
model: 'gpt-5.5', // Better reasoning for legal text
temperature: 0.1, // Conservative for legal
// Compliance
compliance: {
barAssociation: 'approved',
clientConfidentiality: 'encrypted_at_rest_and_in_transit',
aiDisclosure: 'included_in_output'
}
};
Integration Patterns
n8n + Directus for Content Management
// Directus CMS integration for RAG content
const directusIntegration = {
// Sync Directus content to vector database
syncConfig: {
trigger: 'directus.hook',
events: ['items.create', 'items.update', 'items.delete'],
collections: ['articles', 'documentation', 'faqs'],
// Transform Directus content
transform: {
// Combine multiple fields
text: '{{ $json.content }}\n\n{{ $json.excerpt }}',
// Extract metadata
metadata: {
title: '{{ $json.title }}',
slug: '{{ $json.slug }}',
category: '{{ $json.category.name }}',
tags: '{{ $json.tags.map(t => t.name) }}',
author: '{{ $json.user_created.first_name }}',
published: '{{ $json.date_published }}',
status: '{{ $json.status }}'
}
},
// Filter published content only
filter: 'status == "published"'
},
// Query Directus from RAG
queryIntegration: {
// When RAG finds a relevant chunk, fetch full content from Directus
enrichment: {
endpoint: 'https://directus.company.com/items/articles/{{ $json.metadata.slug }}',
fields: ['content', 'related_articles', 'attachments'],
includeRelations: true
}
},
// Update Directus with RAG analytics
feedbackLoop: {
// Track which content is most useful
queryLog: 'directus.rag_queries',
// Update article popularity
popularityMetric: {
collection: 'articles',
field: 'rag_retrieval_count',
increment: 1
}
}
};
n8n + Slack for Team Knowledge
// Slack integration for team knowledge
const slackIntegration = {
// Index Slack conversations
indexing: {
channels: ['#knowledge-base', '#product-discussions', '#engineering'],
excludeBots: true,
excludeCommands: true,
// Thread context preservation
threadContext: {
includeParent: true,
includeReplies: true,
maxThreadDepth: 5
}
},
// Slack bot for queries
bot: {
trigger: '@knowledgebot',
// Response in thread
responseMode: 'thread',
// Include source links
includeSources: true,
// Summarize for Slack
summarize: {
maxLength: 3000, // Slack message limit
includeHighlights: true
}
},
// Learn from reactions
feedback: {
thumbsUp: 'positive_feedback',
thumbsDown: 'negative_feedback',
// Auto-improve based on reactions
retraining: 'weekly'
}
};
Security and Privacy
Data Protection
// Security configuration for RAG systems
const securityConfig = {
// Encryption at rest
encryption: {
vectors: 'aes-256-gcm',
metadata: 'aes-256-gcm',
backups: 'aes-256-gcm'
},
// Encryption in transit
tls: {
version: '1.3',
certificates: 'letsencrypt',
hsts: true
},
// Access controls
rbac: {
roles: [
{ name: 'admin', permissions: ['read', 'write', 'delete', 'configure'] },
{ name: 'editor', permissions: ['read', 'write'] },
{ name: 'viewer', permissions: ['read'] }
],
// Row-level security on documents
documentLevel: true
},
// Audit logging
audit: {
events: ['query', 'ingest', 'update', 'delete', 'access_denied'],
retention: '2_years',
tamperProof: true
},
// PII handling
pii: {
detection: 'automatic',
redaction: 'mask', // or 'remove', 'hash'
entities: ['email', 'phone', 'ssn', 'credit_card', 'name'],
// Don't index PII in vectors
excludeFromEmbedding: true
},
// Data residency
residency: {
vectors: 'eu-west-1', // GDPR compliant
backups: 'eu-central-1'
}
};
Testing and Validation
RAG Evaluation Framework
// Comprehensive RAG testing
const ragEvaluation = {
// Test datasets
datasets: {
// Questions with known answers
qaPairs: [
{
question: 'What is our refund policy?',
expectedAnswer: 'We offer full refunds within 30 days',
expectedSources: ['policies/refund.pdf']
}
],
// Edge cases
edgeCases: [
{ question: 'asdfghjkl', expectedBehavior: 'graceful_fallback' },
{ question: 'What is 2+2?', expectedBehavior: 'no_hallucination' }
]
},
// Metrics
metrics: {
// Retrieval metrics
hitRate: {
@1: 0.75, // Top 1 is relevant
@5: 0.90, // Relevant in top 5
@10: 0.95 // Relevant in top 10
},
// Generation metrics
bleu: 0.45,
rouge: 0.52,
faithfulness: 0.88,
answerRelevance: 0.91,
// Latency
p95Latency: '< 3 seconds'
},
// Automated testing in n8n
testWorkflow: [
{
node: 'Load Test Dataset',
type: 'readBinaryFile',
path: '/tests/rag-test-cases.json'
},
{
node: 'Run Test Queries',
type: 'httpRequest',
url: 'https://api.company.com/rag-query',
batchSize: 10
},
{
node: 'Calculate Metrics',
type: 'code',
code: `
const results = $input.all();
const metrics = calculateMetrics(results);
return [{ json: metrics }];
`
},
{
node: 'Compare to Thresholds',
type: 'if',
condition: '{{ $json.hitRate@5 >= 0.90 }}'
},
{
node: 'Report Results',
type: 'slack',
message: 'RAG evaluation complete. Hit rate @5: {{ $json.hitRate@5 }}'
}
]
};
Conclusion
Retrieval-Augmented Generation represents the most significant advancement in enterprise AI since the introduction of large language models themselves. By combining GPT-5.5's enhanced reasoning capabilities with well-architected vector databases and intelligent retrieval strategies, organizations can build knowledge-intensive automation that is accurate, verifiable, and cost-effective.
The patterns and implementations covered in this guide—from basic document ingestion to advanced multi-query retrieval, from cost optimization to production monitoring—provide a comprehensive foundation for building RAG systems that scale. As GPT-5.5 continues to roll out across platforms and vector database technologies mature, we expect RAG to become the standard architecture for enterprise AI applications.
Key Takeaways:
- Chunking is Critical: The way you split documents has more impact on RAG quality than any other factor. Invest in intelligent, content-aware chunking strategies.
- Hybrid Search Wins: Combining vector similarity with keyword matching and re-ranking consistently outperforms pure semantic search by 15-30%.
- GPT-5.5 Changes the Economics: With 40% token efficiency improvements and enhanced reasoning, GPT-5.5 makes production RAG more affordable and effective than ever.
- Observability is Non-Negotiable: Production RAG systems require comprehensive monitoring of both retrieval quality and generation quality.
- Start Simple, Scale Smart: Begin with basic RAG patterns and add complexity (multi-query, HyDE, self-reflection) only when simple approaches prove insufficient.
What's Next?
- Implement the ingestion pipeline from Section 2
- Set up monitoring using the patterns from Section 5
- Experiment with different chunking strategies on your own documents
- Join the n8n community to share your RAG implementations
The future of enterprise AI is not about models knowing everything—it's about models knowing how to find and use the right information at the right time. That's what RAG delivers.
Need help implementing RAG for your business? Contact Tropical Media for expert consulting on AI automation, n8n workflows, and knowledge-intensive systems.
Resources
- Qdrant Documentation
- Pinecone RAG Guide
- OpenAI Embeddings API
- n8n Vector Store Nodes
- LangChain RAG Tutorial
- GPT-5.5 Release Notes
Tags
#RAG #VectorDatabases #n8n #GPT-5.5 #AI-Agents #Knowledge-Management #Qdrant #Pinecone #Automation #Retrieval-Augmented-Generation #Enterprise-AI #Machine-Learning #Workflow-Automation #Natural-Language-Processing #Semantic-Search
Production-Grade AI Agent Orchestration: Scaling Multi-Agent Systems with Event-Driven Architecture
Master production-grade AI agent orchestration with event-driven architecture, message queues, and scalable patterns. Learn how to manage 100+ agents in n8n workflows, implement resilient error handling, optimize costs, and build fault-tolerant multi-agent systems with Redis, RabbitMQ, and Temporal.
n8n MCP Workflow Building with Claude: From Natural Language to Production-Ready Automation
Learn how to use n8n's new MCP server with Claude AI to build complete workflows from natural language prompts. Discover the revolutionary shift from manual node configuration to AI-assisted workflow architecture, with 20+ practical examples for business automation, integrations, and agentic systems.