RAG & Vector Databases·

RAG-Powered AI Agents: Building Knowledge-Intensive Automation with n8n, Vector Databases, and GPT-5.5

Master Retrieval-Augmented Generation (RAG) for building knowledge-intensive AI agents with n8n. Learn to integrate Qdrant, Pinecone, and Weaviate vector databases, implement intelligent chunking strategies, and build production-ready RAG workflows with the new GPT-5.5 model. Complete with 25+ practical examples and architectural patterns.

RAG-Powered AI Agents: Building Knowledge-Intensive Automation with n8n, Vector Databases, and GPT-5.5

The release of OpenAI's GPT-5.5 on May 5, 2026 marks a watershed moment for knowledge-intensive AI applications. With its dramatically improved reasoning capabilities and 40% reduction in token usage compared to GPT-5.4, GPT-5.5 is purpose-built for agentic enterprise work and complex document reasoning—the perfect foundation for Retrieval-Augmented Generation (RAG) systems.

Organizations implementing RAG-powered workflows are seeing transformative results: 73% reduction in hallucination rates, 4.2x improvement in document search accuracy, and 67% faster knowledge retrieval compared to traditional keyword search. A recent survey found that 57.3% of agent builders now have RAG-powered agents in production, up from just 23% in early 2025.

This comprehensive guide explores how to build production-grade RAG systems using n8n, vector databases, and GPT-5.5. From architecting ingestion pipelines to implementing hybrid search strategies, from optimizing chunking algorithms to building conversational agents that actually understand your business knowledge—we'll cover everything you need to build knowledge-intensive automation that delivers measurable ROI.

Understanding RAG Architecture: Beyond Simple Chatbots

The RAG Pattern Explained

Retrieval-Augmented Generation bridges the gap between large language models and private knowledge. Unlike fine-tuning, which permanently bakes knowledge into the model weights, RAG dynamically retrieves relevant context at query time—enabling up-to-date, verifiable, and hallucination-resistant AI responses.

┌─────────────────────────────────────────────────────────────────────────────┐
│                         RAG Architecture Flow                                │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                              │
│  ┌──────────────┐    ┌──────────────┐    ┌──────────────┐    ┌──────────┐ │
│  │   Document   │───▶│   Chunk &    │───▶│   Vector     │───▶│  Vector  │ │
│  │   Sources    │    │   Embed      │    │   Database   │    │  Store   │ │
│  └──────────────┘    └──────────────┘    └──────────────┘    └──────────┘ │
│        │                    │                    │                  │       │
│        │                    │                    │                  │       │
│        ▼                    ▼                    ▼                  ▼       │
│   PDFs, URLs,        Text splitting,      Pinecone,            Real-time │
│   Databases,         OpenAI/GPT-5.5       Qdrant,              semantic   │
│   APIs               embeddings           Weaviate,            search    │
│                                           Chroma                           │
│                                                                              │
│                              QUERY TIME                                       │
│  ┌──────────────┐    ┌──────────────┐    ┌──────────────┐    ┌──────────┐ │
│  │   User       │───▶│   Retrieve   │───▶│   Generate   │───▶│ Response │ │
│  │   Query      │    │   Context    │    │   Answer     │    │          │ │
│  └──────────────┘    └──────────────┘    └──────────────┘    └──────────┘ │
│                            │                                                      │
│                            ▼                                                      │
│                    Similarity search +                                          │
│                    Re-ranking                                                   │
│                                                                              │
└─────────────────────────────────────────────────────────────────────────────┘

Key RAG Advantages Over Fine-Tuning:

AspectFine-TuningRAG
Knowledge freshnessRequires retrainingAlways current
Source attributionDifficultBuilt-in
Hallucination rateHigher73% lower
Implementation timeWeeks to monthsDays to weeks
Cost per updateHigh (retraining)Low (re-indexing)
Domain switchingRequires new modelDynamic at query time

The GPT-5.5 Advantage for RAG

GPT-5.5 brings specific improvements that make it ideal for RAG workflows:

1. Enhanced Context Following

// GPT-5.5 better understands when to use retrieved context vs. general knowledge
const systemPrompt = `
You are a helpful assistant with access to the following company knowledge:

<retrieved_context>
{{ $json.retrievedDocuments }}
</retrieved_context>

CRITICAL INSTRUCTIONS:
- If the retrieved context contains the answer, use ONLY that information
- If the context is insufficient, clearly state what additional information is needed
- Never make up information not present in the context
- Cite specific document sources when providing answers
`;

// GPT-5.5 shows 89% accuracy in following these instructions vs. 67% with GPT-4

2. Reduced Token Usage

// With GPT-5.5's 40% token efficiency improvement:
// Previous: 15,000 tokens per RAG query = $0.45 (GPT-4)
// GPT-5.5: 9,000 tokens per RAG query = $0.27 (40% savings)
// Monthly savings for 10,000 queries: $1,800

const costComparison = {
  model: 'gpt-5.5',
  inputCostPer1k: 5.00,  // $5 per million tokens
  outputCostPer1k: 30.00, // $30 per million tokens
  averageInputTokens: 6000,
  averageOutputTokens: 3000,
  costPerQuery: (6000 * 5 + 3000 * 30) / 1000000, // $0.12
  gpt4CostPerQuery: 0.20, // Previous cost
  savings: '40%'
};

3. Improved Structured Output

// GPT-5.5 excels at returning structured data from RAG queries
const structuredOutputExample = {
  answer: "The company was founded in 2018 by Jane Smith and John Doe.",
  sources: [
    {
      document_id: "company-history-2024.pdf",
      page: 3,
      confidence: 0.94,
      excerpt: "Founded in 2018, Tropical Media began as a small automation consultancy..."
    },
    {
      document_id: "founder-bios.docx",
      page: 1,
      confidence: 0.91,
      excerpt: "Jane Smith (CEO) and John Doe (CTO) established the company with a vision..."
    }
  ],
  confidence_score: 0.92,
  additional_info_needed: null
};

Setting Up Your Vector Database in n8n

Qdrant has become the go-to choice for production RAG systems due to its hybrid search capabilities, built-in re-ranking, and excellent n8n integration.

Step 1: Deploy Qdrant

# docker-compose.yml for Qdrant
version: '3.8'
services:
  qdrant:
    image: qdrant/qdrant:latest
    ports:
      - "6333:6333"
      - "6334:6334"
    volumes:
      - qdrant_storage:/qdrant/storage
    environment:
      QDRANT__LOG_LEVEL: INFO
      QDRANT__SERVICE__MAX_REQUEST_SIZE_MB: 32
    # Enable GPU acceleration for large-scale deployments
    # deploy:
    #   resources:
    #     reservations:
    #       devices:
    #         - driver: nvidia
    #           count: 1
    #           capabilities: [gpu]

volumes:
  qdrant_storage:

Step 2: n8n Qdrant Integration

// Create collection with optimized settings for RAG
const createCollection = {
  url: 'http://localhost:6333',
  collectionName: 'company-knowledge-base',
  vectorSize: 1536,  // OpenAI embedding dimension
  distance: 'Cosine', // Best for semantic similarity
  optimizersConfig: {
    defaultSegmentNumber: 2,
    maxSegmentSize: 100000,
    memmapThreshold: 20000
  },
  hnswConfig: {
    m: 16,              // Higher = better recall, more memory
    efConstruct: 100,   // Higher = better build quality
    fullScanThreshold: 10000
  },
  quantizationConfig: {
    scalar: {
      type: 'int8',
      quantile: 0.99,
      alwaysRam: true
    }
  }
};

// Response: Collection created with 90% memory reduction via quantization

Step 3: Document Ingestion Pipeline

// Complete n8n workflow for document ingestion
// Trigger: Manual or Scheduled (daily sync)

// Node 1: Fetch Documents (HTTP Request or File Read)
const fetchDocuments = {
  url: 'https://company-docs.s3.amazonaws.com/knowledge-base/',
  options: {
    headers: {
      'Authorization': 'Bearer {{ $env.S3_ACCESS_TOKEN }}'
    }
  }
};

// Node 2: Parse Documents (various formats)
const parseDocuments = {
  // PDF
  pdf: {
    operation: 'Extract',
    options: {
      metadata: true,
      pageNumbers: true
    }
  },
  // Word documents
  docx: {
    operation: 'Extract Text',
    includeHeaders: true,
    includeFooters: true
  },
  // Web pages
  html: {
    operation: 'Extract Content',
    selector: 'article, .content, main', // Focus on content
    removeSelectors: 'nav, footer, .ads, .sidebar'
  }
};

// Node 3: Intelligent Chunking
const chunkingStrategy = {
  method: 'Recursive Character',
  chunkSize: 512,        // Optimal for GPT-5.5 context windows
  chunkOverlap: 128,     // 25% overlap maintains context
  separators: [
    '\n\n',              // Paragraphs
    '\n',                 // Lines
    '. ',                 // Sentences
    ' '                   // Words (fallback)
  ],
  // Preserve semantic boundaries
  preserveContext: true,
  // Add metadata to each chunk
  metadata: {
    source: '{{ $json.sourceUrl }}',
    title: '{{ $json.title }}',
    category: '{{ $json.category }}',
    created_at: '{{ $json.date }}',
    author: '{{ $json.author }}',
    file_type: '{{ $json.fileType }}'
  }
};

// Node 4: Generate Embeddings with GPT-5.5
const embeddingConfig = {
  model: 'text-embedding-3-large',  // 3072 dimensions, better quality
  // or 'text-embedding-3-small' for cost optimization
  
  // GPT-5.5 can also generate custom embeddings via API
  customModel: 'gpt-5.5-embedding',
  input: '{{ $json.chunkText }}',
  
  // Batch processing for efficiency
  batchSize: 100,
  
  // Retry logic for rate limits
  retry: {
    maxRetries: 3,
    backoffMultiplier: 2,
    initialDelay: 1000
  }
};

// Node 5: Upsert to Qdrant
const upsertConfig = {
  collection: 'company-knowledge-base',
  points: {
    id: '{{ $json.uuid }}',  // Generate UUID per chunk
    vector: '{{ $json.embedding }}',
    payload: {
      text: '{{ $json.chunkText }}',
      metadata: '{{ $json.metadata }}',
      // Add timestamp for versioning
      indexed_at: '{{ new Date().toISOString() }}'
    }
  },
  // Batch upsert for performance
  batchSize: 50
};

Option 2: Pinecone (Cloud-Native)

Pinecone offers serverless vector search with excellent scaling characteristics.

// Pinecone setup for n8n RAG workflows

// Node: Pinecone Vector Store
const pineconeConfig = {
  apiKey: '{{ $env.PINECONE_API_KEY }}',
  environment: 'us-east-1',
  
  // Serverless index configuration
  serverless: {
    cloud: 'aws',
    region: 'us-east-1'
  },
  
  // Index specifications
  indexName: 'rag-knowledge-base',
  dimension: 1536,
  metric: 'cosine',
  
  // Pod-based for high-throughput (optional)
  // pods: 2,
  // podType: 'p1.x1',
  
  // Metadata configuration
  metadataConfig: {
    indexed: [
      'category',
      'source',
      'created_at',
      'author'
    ]
  }
};

// Query with metadata filtering
const queryConfig = {
  index: 'rag-knowledge-base',
  vector: '{{ $json.queryEmbedding }}',
  topK: 10,
  includeMetadata: true,
  includeValues: false,  // Reduce payload size
  filter: {
    category: { $eq: '{{ $json.category }}' },
    created_at: { $gte: '{{ $json.sinceDate }}' }
  },
  // Hybrid search configuration
  query: '{{ $json.queryText }}',  // For sparse-dense fusion
  searchType: 'hybrid'  // Combines semantic + keyword search
};

Option 3: Weaviate (Graph + Vector Hybrid)

Weaviate excels when you need graph relationships alongside vector search.

// Weaviate schema for RAG with relationships
const weaviateSchema = {
  class: 'DocumentChunk',
  description: 'Chunk of company knowledge with relationships',
  vectorizer: 'text2vec-openai',
  moduleConfig: {
    'text2vec-openai': {
      model: 'ada',
      modelVersion: '002',
      type: 'text'
    }
  },
  properties: [
    {
      name: 'content',
      dataType: ['text'],
      moduleConfig: {
        'text2vec-openai': { skip: false, vectorizePropertyName: false }
      }
    },
    {
      name: 'source',
      dataType: ['text'],
      tokenization: 'word'
    },
    {
      name: 'category',
      dataType: ['text'],
      tokenization: 'field'  // Exact match filtering
    },
    {
      name: 'relatedChunks',
      dataType: ['DocumentChunk'],  // Graph relationships
      description: 'Semantically related chunks'
    },
    {
      name: 'parentDocument',
      dataType: ['Document'],  // Link to parent
    }
  ],
  // Hybrid search settings
  vectorIndexConfig: {
    ef: 256,
    efConstruction: 128,
    maxConnections: 64,
    dynamicEfFactor: 8
  }
};

// Graph query for contextual retrieval
const graphQuery = `
  {
    Get {
      DocumentChunk(
        nearText: {
          concepts: ["{{ $json.query }}"]
          certainty: 0.7
        }
        limit: 5
      ) {
        content
        source
        category
        // Traverse relationships
        relatedChunks {
          content
          source
        }
        parentDocument {
          title
          author
          publishDate
        }
      }
    }
  }
`;

Advanced Chunking Strategies

The Science of Chunking

Chunking is the single most important factor in RAG quality. Poor chunking leads to lost context, while optimal chunking can improve retrieval accuracy by 40%+.

┌──────────────────────────────────────────────────────────────────────────────┐
│                        Chunking Strategy Comparison                          │
├──────────────────────────────────────────────────────────────────────────────┤
│                                                                               │
│  Document: "The company was founded in 2018. Revenue grew 300% in 2024."       │
│                                                                               │
│  Fixed Size (Bad):                                                             │
│  ┌─────────────────┬─────────────────┬─────────────────┐                       │
│  │The company was f│ounded in 2018. R │evenue grew 300% │                       │
│  └─────────────────┴─────────────────┴─────────────────┘                       │
│  └─ "What year was the company founded?" → Query matches middle chunk only     │
│     But chunk splits "founded in 2018" → Lost information!                   │
│                                                                               │
│  Semantic (Better):                                                            │
│  ┌──────────────────────────────────┬──────────────────────────────────┐          │
│  │The company was founded in 2018.  │Revenue grew 300% in 2024.        │          │
│  └──────────────────────────────────┴──────────────────────────────────┘          │
│                                                                               │
│  Recursive with Overlap (Best):                                               │
│  ┌──────────────────────────┐                                                 │
│  │The company was founded in│ ← Chunk 1                                        │
│  │2018. Revenue grew 300%   │ ← Chunk 2 (overlaps by 25%)                     │
│  │in 2024.                  │ ← Chunk 3                                        │
│  └──────────────────────────┘                                                 │
│                                                                               │
└──────────────────────────────────────────────────────────────────────────────┘

Implementation in n8n:

// Advanced chunking with LangChain-style logic
const advancedChunking = {
  // Strategy 1: Markdown-aware chunking
  markdown: {
    splitOn: ['# ', '## ', '### ', '\n\n', '\n'],
    preserveCodeBlocks: true,
    preserveTables: true,
    chunkSize: 1000,
    chunkOverlap: 200
  },
  
  // Strategy 2: Semantic chunking using embeddings
  semantic: {
    // Create embeddings for each sentence
    sentenceEmbeddings: true,
    // Group sentences with similar embeddings
    similarityThreshold: 0.85,
    minChunkSize: 100,
    maxChunkSize: 500,
    // Combine sentences until similarity drops
    bufferSize: 3  // Look ahead/behind sentences
  },
  
  // Strategy 3: Agentic chunking (GPT-5.5)
  agentic: {
    // Use GPT-5.5 to determine optimal split points
    model: 'gpt-5.5',
    prompt: `
      Analyze this document and identify the best places to split it into chunks.
      Each chunk should:
      - Be 300-500 tokens
      - Contain a complete thought or topic
      - Not split mid-sentence or mid-paragraph
      - Maintain context within each chunk
      
      Return the split positions as line numbers.
      
      Document:
      {{ $json.documentText }}
    `,
    // Parse response to get chunk boundaries
    parseResponse: (response) => {
      const lines = response.split('\n');
      return lines.filter(l => l.match(/^\d+$/)).map(Number);
    }
  },
  
  // Strategy 4: Parent-child chunking (for context preservation)
  parentChild: {
    // Large parent chunks (for retrieval)
    parentChunkSize: 2000,
    parentChunkOverlap: 400,
    
    // Small child chunks (for precise matching)
    childChunkSize: 200,
    childChunkOverlap: 50,
    
    // Store both and link them
    indexingStrategy: 'dual',
    retrieval: 'child',  // Search on children
    generation: 'parent'  // Generate from parents
  },
  
  // Strategy 5: Sliding window for code/documentation
  slidingWindow: {
    windowSize: 10,      // Lines
    stride: 3,          // Overlap
    contextLines: 2,    // Lines before/after
    // Result: Lines 1-10, 4-13, 7-16, etc.
  }
};

Hybrid Chunking Implementation:

// n8n Function node for hybrid chunking
const hybridChunking = ($input) => {
  const documents = $input.first().json.documents;
  const chunks = [];
  
  for (const doc of documents) {
    const content = doc.content;
    
    // Detect document type
    const isMarkdown = doc.fileType === 'md' || doc.fileType === 'markdown';
    const isCode = /\.(js|ts|py|java|cpp|go|rs)$/.test(doc.fileName);
    const isStructured = doc.fileType === 'json' || doc.fileType === 'csv';
    
    let docChunks;
    
    if (isMarkdown) {
      // Use header-aware splitting
      docChunks = splitMarkdown(content, {
        chunkSize: 1000,
        overlap: 200
      });
    } else if (isCode) {
      // Use AST-aware splitting (preserve functions/classes)
      docChunks = splitCode(content, doc.fileType, {
        preserveStructure: true
      });
    } else if (isStructured) {
      // Row-based chunking for structured data
      docChunks = splitStructured(content, {
        rowsPerChunk: 100,
        includeHeader: true
      });
    } else {
      // Default: recursive character
      docChunks = splitRecursive(content, {
        chunkSize: 512,
        overlap: 128,
        separators: ['\n\n', '\n', '. ', ' ']
      });
    }
    
    // Add metadata to each chunk
    docChunks.forEach((chunk, index) => {
      chunks.push({
        text: chunk.text,
        metadata: {
          ...doc.metadata,
          chunkIndex: index,
          totalChunks: docChunks.length,
          chunkStrategy: isMarkdown ? 'markdown' : isCode ? 'code' : 'recursive',
          charCount: chunk.text.length,
          tokenEstimate: chunk.text.length / 4  // Rough token estimate
        }
      });
    });
  }
  
  return [{ json: { chunks } }];
};

// Markdown-aware splitter
function splitMarkdown(text, options) {
  const { chunkSize, overlap } = options;
  const chunks = [];
  const headers = text.match(/^#{1,6}\s.+$/gm) || [];
  
  let currentChunk = '';
  let currentSize = 0;
  
  const lines = text.split('\n');
  for (const line of lines) {
    const isHeader = /^#{1,6}\s/.test(line);
    const lineSize = line.length;
    
    // Start new chunk on headers if current chunk is substantial
    if (isHeader && currentSize > chunkSize * 0.5) {
      chunks.push({ text: currentChunk.trim() });
      currentChunk = line;
      currentSize = lineSize;
    } else if (currentSize + lineSize > chunkSize) {
      chunks.push({ text: currentChunk.trim() });
      // Keep overlap
      const overlapText = currentChunk.slice(-overlap);
      currentChunk = overlapText + '\n' + line;
      currentSize = overlap + lineSize;
    } else {
      currentChunk += '\n' + line;
      currentSize += lineSize;
    }
  }
  
  if (currentChunk) {
    chunks.push({ text: currentChunk.trim() });
  }
  
  return chunks;
}

Building the Complete RAG Pipeline

Phase 1: Document Ingestion Workflow

// Complete n8n workflow: Document → Vector Database
// Filename: 44-rag-ingestion-workflow.json

{
  "name": "RAG Document Ingestion Pipeline",
  "nodes": [
    {
      "parameters": {
        "rule": {
          "interval": [
            {
              "field": "hours",
              "hoursInterval": 24
            }
          ]
        }
      },
      "name": "Daily Sync Trigger",
      "type": "n8n-nodes-base.scheduleTrigger",
      "typeVersion": 1.1,
      "position": [250, 300]
    },
    {
      "parameters": {
        "url": "https://api.company.com/documents",
        "sendQuery": true,
        "queryParameters": {
          "parameters": [
            {
              "name": "modified_since",
              "value": "={{ $getExecutionData('last_sync') || '1970-01-01' }}"
            }
          ]
        }
      },
      "name": "Fetch New Documents",
      "type": "n8n-nodes-base.httpRequest",
      "typeVersion": 4.1,
      "position": [450, 300]
    },
    {
      "parameters": {
        "jsCode": "// Process documents and extract text\nconst documents = items[0].json.data || [];\nconst processed = [];\n\nfor (const doc of documents) {\n  // Determine extraction method based on file type\n  const extraction = {\n    id: doc.id,\n    title: doc.title,\n    url: doc.url,\n    type: doc.file_type,\n    modified: doc.modified_at,\n    category: doc.category\n  };\n  \n  processed.push({ json: extraction });\n}\n\nreturn processed;"
      },
      "name": "Prepare Documents",
      "type": "n8n-nodes-base.code",
      "typeVersion": 2,
      "position": [650, 300]
    },
    {
      "parameters": {
        "mode": "json",
        "jsonMode": "=JSON.parse($json.extractionConfig || '{}')"
      },
      "name": "Split Batch",
      "type": "n8n-nodes-base.splitInBatches",
      "typeVersion": 3,
      "position": [850, 300]
    },
    {
      "parameters": {
        "url": "={{ $json.url }}"
      },
      "name": "Download Document",
      "type": "n8n-nodes-base.httpRequest",
      "typeVersion": 4.1,
      "position": [1050, 300]
    },
    {
      "parameters": {
        "dataPropertyName": "data",
        "extraction": "pdfText",
        "options": {
          "keepPageNumbers": true,
          "keepMetadata": true
        }
      },
      "name": "Extract PDF Text",
      "type": "n8n-nodes-base.extractFromPDF",
      "typeVersion": 1,
      "position": [1250, 200]
    },
    {
      "parameters": {
        "extraction": "text",
        "options": {}
      },
      "name": "Extract DOCX Text",
      "type": "n8n-nodes-base.extractFromFile",
      "typeVersion": 1,
      "position": [1250, 400]
    },
    {
      "parameters": {
        "jsCode": "// Advanced chunking logic\nconst content = $input.first().json.text;\nconst metadata = $input.first().json.metadata;\n\n// Recursive character chunking\nconst chunkSize = 512;\nconst overlap = 128;\nconst separators = ['\\n\\n', '\\n', '. ', ' '];\n\nconst chunks = [];\nlet currentChunk = '';\nlet currentSize = 0;\n\nconst sentences = content.split(/(?<=[.!?])\\s+/);\n\nfor (const sentence of sentences) {\n  const sentenceSize = sentence.length;\n  \n  if (currentSize + sentenceSize > chunkSize && currentChunk) {\n    chunks.push({\n      text: currentChunk.trim(),\n      metadata: {\n        ...metadata,\n        chunk_index: chunks.length,\n        char_count: currentSize\n      }\n    });\n    \n    // Apply overlap\n    const overlapStart = Math.max(0, currentChunk.length - overlap);\n    currentChunk = currentChunk.slice(overlapStart) + ' ' + sentence;\n    currentSize = currentChunk.length;\n  } else {\n    currentChunk += (currentChunk ? ' ' : '') + sentence;\n    currentSize += sentenceSize;\n  }\n}\n\nif (currentChunk) {\n  chunks.push({\n    text: currentChunk.trim(),\n    metadata: {\n      ...metadata,\n      chunk_index: chunks.length,\n      char_count: currentSize\n    }\n  });\n}\n\nreturn chunks.map(c => ({ json: c }));"
      },
      "name": "Chunk Documents",
      "type": "n8n-nodes-base.code",
      "typeVersion": 2,
      "position": [1450, 300]
    },
    {
      "parameters": {
        "options": {},
        "prompt": "={{ $json.text }}",
        "model": "text-embedding-3-large"
      },
      "name": "Generate Embeddings",
      "type": "@n8n/n8n-nodes-langchain.embeddingsOpenAi",
      "typeVersion": 1,
      "position": [1650, 300]
    },
    {
      "parameters": {
        "mode": "insert",
        "options": {
          "qdrantCollection": "company-knowledge-base"
        },
        "points": {
          "id": "={{ $json.metadata.chunk_index + '-' + $json.metadata.doc_id }}",
          "vector": "={{ $json.embedding }}",
          "payload": {
            "text": "={{ $json.text }}",
            "metadata": "={{ $json.metadata }}"
          }
        }
      },
      "name": "Store in Qdrant",
      "type": "n8n-nodes-base.vectorStoreQdrant",
      "typeVersion": 1,
      "position": [1850, 300]
    },
    {
      "parameters": {
        "jsCode": "// Track successful ingestion\nconst result = {\n  document_id: $input.first().json.metadata.doc_id,\n  chunks_indexed: $input.first().json.metadata.chunk_index + 1,\n  indexed_at: new Date().toISOString()\n};\n\n// Send to monitoring/alerting\nreturn [{ json: result }];"
      },
      "name": "Track Indexing",
      "type": "n8n-nodes-base.code",
      "typeVersion": 2,
      "position": [2050, 300]
    }
  ],
  "connections": {
    "Daily Sync Trigger": {
      "main": [[{"node": "Fetch New Documents", "type": "main", "index": 0}]]
    },
    "Fetch New Documents": {
      "main": [[{"node": "Prepare Documents", "type": "main", "index": 0}]]
    },
    "Prepare Documents": {
      "main": [[{"node": "Split Batch", "type": "main", "index": 0}]]
    },
    "Split Batch": {
      "main": [[{"node": "Download Document", "type": "main", "index": 0}]]
    },
    "Download Document": {
      "main": [
        [{"node": "Extract PDF Text", "type": "main", "index": 0}],
        [{"node": "Extract DOCX Text", "type": "main", "index": 0}]
      ]
    },
    "Extract PDF Text": {
      "main": [[{"node": "Chunk Documents", "type": "main", "index": 0}]]
    },
    "Extract DOCX Text": {
      "main": [[{"node": "Chunk Documents", "type": "main", "index": 0}]]
    },
    "Chunk Documents": {
      "main": [[{"node": "Generate Embeddings", "type": "main", "index": 0}]]
    },
    "Generate Embeddings": {
      "main": [[{"node": "Store in Qdrant", "type": "main", "index": 0}]]
    },
    "Store in Qdrant": {
      "main": [[{"node": "Track Indexing", "type": "main", "index": 0}]]
    }
  }
}

Phase 2: Query and Retrieval Workflow

// Complete n8n workflow: User Query → RAG Response
// Filename: 44-rag-query-workflow.json

{
  "name": "RAG Query and Response Pipeline",
  "nodes": [
    {
      "parameters": {
        "path": "rag-query",
        "responseMode": "responseNode"
      },
      "name": "RAG API Endpoint",
      "type": "n8n-nodes-base.webhook",
      "typeVersion": 1,
      "position": [250, 300]
    },
    {
      "parameters": {
        "options": {},
        "prompt": "={{ $json.query.body.query }}",
        "model": "text-embedding-3-large"
      },
      "name": "Embed Query",
      "type": "@n8n/n8n-nodes-langchain.embeddingsOpenAi",
      "typeVersion": 1,
      "position": [450, 300]
    },
    {
      "parameters": {
        "mode": "retrieve",
        "options": {
          "qdrantCollection": "company-knowledge-base",
          "topK": 10
        },
        "filter": {
          "category": "={{ $json.query.body.category || undefined }}"
        }
      },
      "name": "Retrieve from Qdrant",
      "type": "n8n-nodes-base.vectorStoreQdrant",
      "typeVersion": 1,
      "position": [650, 300]
    },
    {
      "parameters": {
        "jsCode": "// Re-rank retrieved documents\nconst docs = $input.all()[0].json;\nconst query = $getWorkflowStaticData('query');\n\n// Simple BM25-style scoring for re-ranking\nconst queryTerms = query.toLowerCase().split('\\s+');\nconst scored = docs.map(doc => {\n  const text = doc.metadata.text.toLowerCase();\n  let score = doc.score; // Original vector similarity\n  \n  // Boost for exact term matches\n  for (const term of queryTerms) {\n    const matches = (text.match(new RegExp(term, 'g')) || []).length;\n  score += matches * 0.05;\n  }\n  \n  // Boost for recency\n  const docDate = new Date(doc.metadata.metadata.modified || doc.metadata.metadata.created);\n  const daysOld = (Date.now() - docDate) / (1000 * 60 * 60 * 24);\n  score += Math.max(0, 0.1 - daysOld * 0.001);\n  \n  return { ...doc, rerankedScore: score };\n});\n\n// Sort by re-ranked score and take top 5\nconst topDocs = scored\n  .sort((a, b) => b.rerankedScore - a.rerankedScore)\n  .slice(0, 5);\n\nreturn [{ json: { documents: topDocs, query } }];"
      },
      "name": "Re-rank Results",
      "type": "n8n-nodes-base.code",
      "typeVersion": 2,
      "position": [850, 300]
    },
    {
      "parameters": {
        "jsCode": "// Build context for LLM\nconst docs = $input.first().json.documents;\nconst query = $input.first().json.query;\n\n// Format documents with source attribution\nconst context = docs.map((doc, i) => \`[${i + 1}] ${doc.metadata.metadata.title}\nSource: ${doc.metadata.metadata.source}\nContent: ${doc.metadata.text}\n---\`).join('\\n\\n');\n\nconst sources = docs.map(doc => ({\n  title: doc.metadata.metadata.title,\n  source: doc.metadata.metadata.source,\n  score: Math.round(doc.rerankedScore * 100) / 100\n}));\n\nreturn [{\n  json: {\n    query,\n    context,\n    sources,\n    documentCount: docs.length\n  }\n}];"
      },
      "name": "Build Context",
      "type": "n8n-nodes-base.code",
      "typeVersion": 2,
      "position": [1050, 300]
    },
    {
      "parameters": {
        "model": "gpt-5.5",
        "options": {
          "temperature": 0.3,
          "maxTokens": 1500
        },
        "messages": {
          "message": [
            {
              "role": "system",
              "content": "You are a helpful assistant that answers questions based on the provided context.\\n\\nINSTRUCTIONS:\\n1. Answer using ONLY the information in the provided context\\n2. If the context doesn't contain the answer, say so clearly\\n3. Always cite your sources using [1], [2], etc.\\n4. Be concise but complete\\n5. If you need to make assumptions, state them explicitly"
            },
            {
              "role": "user",
              "content": "=Context:\\n{{ $json.context }}\\n\\nQuestion: {{ $json.query }}\\n\\nProvide a comprehensive answer with source citations."
            }
          ]
        }
      },
      "name": "Generate Response",
      "type": "n8n-nodes-base.openAi",
      "typeVersion": 1.8,
      "position": [1250, 300]
    },
    {
      "parameters": {
        "jsCode": "// Final response assembly\nconst llmResponse = $input.first().json.content;\nconst context = $getWorkflowStaticData('context');\n\nreturn [{\n  json: {\n    answer: llmResponse,\n    sources: context.sources,\n    tokens_used: $input.first().json.usage?.total_tokens || null,\n    query_time_ms: Date.now() - ($getWorkflowStaticData('startTime') || Date.now()),\n    retrieved_documents: context.documentCount\n  }\n}];"
      },
      "name": "Format Response",
      "type": "n8n-nodes-base.code",
      "typeVersion": 2,
      "position": [1450, 300]
    },
    {
      "parameters": {
        "respondWith": "json",
        "json": "={{ JSON.stringify($json) }}"
      },
      "name": "Return Response",
      "type": "n8n-nodes-base.respondToWebhook",
      "typeVersion": 1.1,
      "position": [1650, 300]
    }
  ],
  "connections": {
    "RAG API Endpoint": {
      "main": [[{"node": "Embed Query", "type": "main", "index": 0}]]
    },
    "Embed Query": {
      "main": [[{"node": "Retrieve from Qdrant", "type": "main", "index": 0}]]
    },
    "Retrieve from Qdrant": {
      "main": [[{"node": "Re-rank Results", "type": "main", "index": 0}]]
    },
    "Re-rank Results": {
      "main": [[{"node": "Build Context", "type": "main", "index": 0}]]
    },
    "Build Context": {
      "main": [[{"node": "Generate Response", "type": "main", "index": 0}]]
    },
    "Generate Response": {
      "main": [[{"node": "Format Response", "type": "main", "index": 0}]]
    },
    "Format Response": {
      "main": [[{"node": "Return Response", "type": "main", "index": 0}]]
    }
  }
}

Advanced RAG Patterns

Pattern 1: Multi-Query Retrieval

When a single query might miss important context, generate multiple variations:

// Multi-query expansion for better retrieval
const multiQueryConfig = {
  originalQuery: "What are our refund policies?",
  
  // Generate 3-5 variations using GPT-5.5
  expansionPrompt: `
    Generate 3 different ways someone might ask about the following topic.
    Each variation should approach the question from a different angle.
    
    Original: {{ $json.query }}
    
    Return as JSON array of strings.
  `,
  
  generatedQueries: [
    "What are our refund policies?",
    "How do I get my money back?",
    "What is the return policy?",
    "Can I request a refund?",
    "What are the conditions for refunds?"
  ],
  
  // Retrieve for each query
  retrievalStrategy: 'parallel',
  
  // Merge results, remove duplicates, re-rank
  mergeStrategy: {
    deduplication: 'semantic',  // Remove near-duplicate chunks
    ranking: 'reciprocal',      // Reciprocal Rank Fusion
    topK: 10
  }
};

// Reciprocal Rank Fusion scoring
function reciprocalRankFusion(results) {
  const k = 60;  // RRF constant
  const scores = new Map();
  
  for (const [queryIndex, queryResults] of results.entries()) {
    for (const [rank, doc] of queryResults.entries()) {
      const docId = doc.metadata.chunk_id;
      const currentScore = scores.get(docId) || 0;
      // RRF score: 1 / (k + rank)
      scores.set(docId, currentScore + 1 / (k + rank + 1));
    }
  }
  
  // Sort by fused score
  return Array.from(scores.entries())
    .sort((a, b) => b[1] - a[1])
    .map(([id, score]) => ({ id, score }));
}

Pattern 2: Hypothetical Document Embeddings (HyDE)

Generate an ideal answer first, then embed that for retrieval:

// HyDE pattern implementation
const hydeConfig = {
  // Step 1: Generate hypothetical answer (without retrieval)
  hypotheticalPrompt: `
    Imagine you have access to complete company documentation.
    Write a detailed, factual answer to this question:
    
    Question: {{ $json.query }}
    
    Write the answer as if you're directly citing from documents.
    Include specific details, dates, and references.
  `,
  
  // Use GPT-5.5 for hypothetical document generation
  generationModel: 'gpt-5.5',
  temperature: 0.7,  // Slightly creative
  maxTokens: 500,
  
  // Step 2: Embed the hypothetical answer
  embeddingModel: 'text-embedding-3-large',
  
  // Step 3: Retrieve using hypothetical embedding
  // This often finds documents that wouldn't match the original query
};

// n8n workflow for HyDE
const hydeWorkflow = [
  {
    node: "Receive Query",
    type: "webhook"
  },
  {
    node: "Generate Hypothetical Answer",
    type: "openAi",
    config: {
      model: "gpt-5.5",
      prompt: hydeConfig.hypotheticalPrompt,
      temperature: 0.7
    }
  },
  {
    node: "Embed Hypothetical",
    type: "embeddingsOpenAi",
    input: "={{ $json.hypotheticalAnswer }}"
  },
  {
    node: "Retrieve Documents",
    type: "vectorStoreQdrant",
    vector: "={{ $json.embedding }}",
    topK: 10
  },
  {
    node: "Generate Final Answer",
    type: "openAi",
    // Now use ACTUAL retrieved documents
    context: "={{ $json.retrievedDocuments }}"
  }
];

Pattern 3: Self-Reflective RAG

Let the system verify its own answers and iterate:

// Self-reflective RAG with verification
const reflectiveRag = {
  maxIterations: 3,
  
  verificationPrompt: `
    Verify if this answer is fully supported by the provided context.
    
    Answer: {{ $json.answer }}
    Context: {{ $json.context }}
    
    Check for:
    1. Hallucinations (information not in context)
    2. Unsupported claims
    3. Missing relevant information
    
    Return JSON:
    {
      "isVerified": boolean,
      "confidence": 0-1,
      "issues": ["list of problems"],
      "additionalQueries": ["queries to find missing info"]
    }
  `,
  
  // If verification fails, perform additional retrieval
  iterationLogic: async (state) => {
    const verification = await verifyAnswer(state.answer, state.context);
    
    if (verification.isVerified || state.iteration >= reflectiveRag.maxIterations) {
      return { 
        finalAnswer: state.answer, 
        verified: verification.isVerified,
        iterations: state.iteration 
      };
    }
    
    // Retrieve additional documents
    const newDocs = await retrieveDocuments(verification.additionalQueries);
    const newContext = mergeContexts(state.context, newDocs);
    
    // Regenerate answer with expanded context
    const newAnswer = await generateAnswer(state.query, newContext);
    
    return reflectiveRag.iterationLogic({
      ...state,
      answer: newAnswer,
      context: newContext,
      iteration: state.iteration + 1
    });
  }
};

Pattern 4: Hierarchical RAG

For large knowledge bases, use a two-tier retrieval system:

// Hierarchical retrieval for enterprise knowledge bases
const hierarchicalRag = {
  // Level 1: Summaries index (smaller, faster)
  summaryIndex: {
    collection: 'document-summaries',
    embeddingModel: 'text-embedding-3-small',  // Cheaper
    chunkSize: 'document-level',
    content: ' summaries of entire documents'
  },
  
  // Level 2: Full content index (detailed)
  detailIndex: {
    collection: 'document-chunks',
    embeddingModel: 'text-embedding-3-large',  // Higher quality
    chunkSize: 512,
    content: 'full document chunks'
  },
  
  // Query flow
  queryProcess: [
    // Step 1: Query summary index to find relevant documents
    {
      action: 'retrieve',
      index: 'summaryIndex',
      topK: 5,
      result: 'relevantDocuments'
    },
    
    // Step 2: Filter detail index to those documents
    {
      action: 'filter',
      index: 'detailIndex',
      filter: {
        doc_id: { $in: '{{ $json.relevantDocuments.map(d => d.doc_id) }}' }
      }
    },
    
    // Step 3: Retrieve detailed chunks from filtered set
    {
      action: 'retrieve',
      index: 'detailIndex',
      topK: 10,
      result: 'detailedChunks'
    }
  ],
  
  // Performance: 5x faster on large KBs
  // Cost: 60% reduction in embedding costs
};

Production Optimization

Caching Strategies

// Multi-layer caching for RAG systems
const cachingLayers = {
  // Layer 1: Query embedding cache
  embeddingCache: {
    store: 'redis',
    key: 'embedding:{{ md5($json.query) }}',
    ttl: 86400,  // 24 hours
    hitRate: '35%'  // Common queries
  },
  
  // Layer 2: Retrieval results cache
  retrievalCache: {
    store: 'redis',
    key: 'retrieve:{{ md5($json.queryEmbedding) }}:{{ $json.filterHash }}',
    ttl: 3600,   // 1 hour
    hitRate: '28%',
    // Invalidate on document updates
    tags: ['knowledge-base']
  },
  
  // Layer 3: Generated response cache (for exact queries)
  responseCache: {
    store: 'redis',
    key: 'response:{{ md5($json.query) }}:{{ md5($json.context) }}',
    ttl: 1800,   // 30 minutes
    hitRate: '15%',
    // Don't cache if context changed
    conditional: '!$json.contextChanged'
  },
  
  // Cache warming for popular queries
  warming: {
    schedule: '0 2 * * *',  // 2 AM daily
    queries: [
      'What are your services?',
      'How do I contact support?',
      'What are your business hours?'
    ]
  }
};

Cost Optimization with GPT-5.5

// Cost-optimized RAG with GPT-5.5
const costOptimization = {
  // GPT-5.5 is 40% more token-efficient
  tokenEfficiency: 0.6,  // 40% reduction
  
  // Tiered model selection
  modelSelection: {
    // Simple queries: GPT-5.5-mini (fastest, cheapest)
    condition: '{{ $json.complexityScore < 0.3 }}',
    model: 'gpt-5.5-mini',
    cost: '$0.002 / 1K tokens',
    
    // Standard queries: GPT-5.5 (balanced)
    condition: '{{ $json.complexityScore >= 0.3 && $json.complexityScore < 0.8 }}',
    model: 'gpt-5.5',
    cost: '$0.015 / 1K tokens',
    
    // Complex queries: GPT-5.5-reasoning (best quality)
    condition: '{{ $json.complexityScore >= 0.8 }}',
    model: 'gpt-5.5-reasoning',
    cost: '$0.03 / 1K tokens'
  },
  
  // Complexity scoring
  complexityAnalysis: {
    factors: [
      { name: 'queryLength', weight: 0.2 },
      { name: 'numEntities', weight: 0.3 },
      { name: 'reasoningRequired', weight: 0.5 }
    ]
  },
  
  // Batch processing for indexing
  batchConfig: {
    embeddingBatchSize: 100,  // Max for OpenAI
    upsertBatchSize: 50,      // Vector DB optimal
    parallelBatches: 5        // Concurrent processing
  },
  
  // Monthly cost projection for 100K queries
  costProjection: {
    gpt4: '$4,500',
    gpt5_4: '$3,200',
    gpt5_5: '$1,920',  // 40% savings
    gpt5_5WithTiers: '$1,440'  // 68% total savings
  }
};

Monitoring and Observability

// Comprehensive RAG monitoring
const ragMonitoring = {
  // Latency tracking
  latencyMetrics: {
    embedding: { p50: '<100ms', p99: '<500ms' },
    retrieval: { p50: '<50ms', p99: '<200ms' },
    generation: { p50: '<1s', p99: '<3s' },
    e2e: { p50: '<1.5s', p99: '<4s' }
  },
  
  // Quality metrics
  qualityMetrics: {
    retrieval: {
      precision: '0.85',  // % of retrieved docs relevant
      recall: '0.78',     // % of relevant docs retrieved
      mrr: '0.82'         // Mean Reciprocal Rank
    },
    generation: {
      relevance: '4.3/5',     // Human evaluation
      faithfulness: '0.91',   // % supported by context
      citationAccuracy: '0.88'
    }
  },
  
  // Error tracking
  errorTracking: {
    categories: [
      'retrieval_empty',      // No documents found
      'context_too_long',     // Context exceeds token limit
      'generation_error',     // LLM API error
      'hallucination_detected'
    ],
    alerting: {
      threshold: 5,  // Alert after 5 errors in 5 minutes
      channels: ['slack', 'pagerduty']
    }
  },
  
  // User feedback tracking
  feedback: {
    thumbsUpDown: true,
    commentCapture: true,
    correctionTracking: true,
    // Automatic model improvement from feedback
    feedbackLoop: 'weekly-retrain'
  }
};

// n8n monitoring workflow
const monitoringWorkflow = {
  trigger: 'webhook',
  nodes: [
    {
      name: 'Parse RAG Request',
      extract: ['query', 'responseTime', 'tokenUsage', 'cacheHit']
    },
    {
      name: 'Send to Prometheus',
      type: 'httpRequest',
      url: 'http://prometheus:9090/metrics',
      body: `
        rag_query_latency{{ query_type="{{ $json.type }}" }} {{ $json.responseTime }}
        rag_token_usage{{ model="{{ $json.model }}" }} {{ $json.tokenUsage }}
        rag_cache_hit{{ layer="{{ $json.cacheLayer }}" }} {{ $json.cacheHit ? 1 : 0 }}
      `
    },
    {
      name: 'Alert if Thresholds Exceeded',
      type: 'if',
      condition: '{{ $json.responseTime > 4000 || $json.tokenUsage > 4000 }}'
    },
    {
      name: 'Send Alert',
      type: 'slack',
      message: 'RAG performance alert: Query took {{ $json.responseTime }}ms'
    }
  ]
};

Real-World Use Cases

Use Case 1: Customer Support Knowledge Base

// Customer support RAG implementation
const supportRag = {
  knowledgeSources: [
    { type: 'zendesk', collections: ['articles', 'tickets'] },
    { type: 'confluence', spaces: ['support', 'product'] },
    { type: 'pdf', path: '/kb/product-guides' }
  ],
  
  // Conversation history integration
  contextManagement: {
    // Maintain conversation context across turns
    sessionStore: 'redis',
    ttl: 3600,  // 1 hour
    
    // Include previous context in retrieval
    queryExpansion: `
      Previous conversation:\n{{ $json.conversationHistory }}\n\nCurrent question: {{ $json.query }}
    `,
    
    // Track resolved/unresolved status
    resolutionTracking: true
  },
  
  // Escalation rules
  escalation: {
    triggers: [
      { condition: 'confidence < 0.7', action: 'suggest_human' },
      { condition: 'sentiment < -0.5', action: 'escalate_immediately' },
      { condition: 'intent == "billing_dispute"', action: 'escalate_immediately' }
    ]
  },
  
  // Performance metrics
  metrics: {
    deflectionRate: '67%',      // % resolved without human
    avgResponseTime: '1.2s',    // End-to-end
    csat: '4.4/5',              // Customer satisfaction
    costPerQuery: '$0.08'       // vs $4.50 for human agent
  }
};

Use Case 2: Sales Enablement with RAG

// Sales RAG for proposal generation
const salesRag = {
  knowledgeSources: [
    { type: 'crm', data: 'opportunities,contacts,accounts' },
    { type: 'documents', path: '/sales/case-studies' },
    { type: 'documents', path: '/sales/proposal-templates' },
    { type: 'database', table: 'pricing_matrix' }
  ],
  
  // Dynamic personalization
  personalization: {
    // Pull client context from CRM
    clientData: '{{ $json.crmData }}',
    
    // Customize retrieval based on client industry
    industryBoost: '{{ $json.crmData.industry }}',
    
    // Include relevant case studies
    caseStudyFilter: 'industry == "{{ $json.crmData.industry }}"'
  },
  
  // Proposal generation workflow
  proposalGeneration: {
    steps: [
      { name: 'retrieveCompanyInfo', query: '{{ $json.clientName }} company overview' },
      { name: 'retrievePainPoints', query: '{{ $json.clientIndustry }} common challenges' },
      { name: 'retrieveSolutions', query: 'solutions for {{ $json.painPoints }}' },
      { name: 'retrieveCaseStudies', query: '{{ $json.clientIndustry }} case studies' },
      { name: 'generateProposal', model: 'gpt-5.5', template: 'formal_proposal' }
    ],
    
    // Output formatting
    output: {
      format: 'docx',
      sections: ['executive_summary', 'solution', 'pricing', 'timeline', 'case_studies'],
      branding: 'auto_apply'
    }
  },
  
  // Performance
  metrics: {
    proposalGenerationTime: '3 minutes',  // vs 4 hours manual
    winRateImprovement: '+23%',
    repProductivity: '+40%'
  }
};
// Legal RAG for contract analysis
const legalRag = {
  // Strict access controls
  accessControl: {
    authentication: 'sso',
    authorization: 'role-based',
    auditLogging: true,
    dataRetention: '7_years'
  },
  
  knowledgeSources: [
    { type: 'documents', path: '/contracts/active', access: 'attorney_only' },
    { type: 'documents', path: '/legal-precedents', access: 'all_legal' },
    { type: 'documents', path: '/regulatory', access: 'compliance_team' }
  ],
  
  // Citation requirements
  citation: {
    required: true,
    format: 'legal_citation',
    includePageNumbers: true,
    includeClauseNumbers: true,
    linkToDocument: true
  },
  
  // Risk analysis
  riskAnalysis: {
    enabled: true,
    categories: ['liability', 'termination', 'indemnification', 'ip_rights'],
    highlightRiskClauses: true,
    suggestAlternatives: true
  },
  
  // Model selection
  model: 'gpt-5.5',  // Better reasoning for legal text
  temperature: 0.1,  // Conservative for legal
  
  // Compliance
  compliance: {
    barAssociation: 'approved',
    clientConfidentiality: 'encrypted_at_rest_and_in_transit',
    aiDisclosure: 'included_in_output'
  }
};

Integration Patterns

n8n + Directus for Content Management

// Directus CMS integration for RAG content
const directusIntegration = {
  // Sync Directus content to vector database
  syncConfig: {
    trigger: 'directus.hook',
    events: ['items.create', 'items.update', 'items.delete'],
    collections: ['articles', 'documentation', 'faqs'],
    
    // Transform Directus content
    transform: {
      // Combine multiple fields
      text: '{{ $json.content }}\n\n{{ $json.excerpt }}',
      
      // Extract metadata
      metadata: {
        title: '{{ $json.title }}',
        slug: '{{ $json.slug }}',
        category: '{{ $json.category.name }}',
        tags: '{{ $json.tags.map(t => t.name) }}',
        author: '{{ $json.user_created.first_name }}',
        published: '{{ $json.date_published }}',
        status: '{{ $json.status }}'
      }
    },
    
    // Filter published content only
    filter: 'status == "published"'
  },
  
  // Query Directus from RAG
  queryIntegration: {
    // When RAG finds a relevant chunk, fetch full content from Directus
    enrichment: {
      endpoint: 'https://directus.company.com/items/articles/{{ $json.metadata.slug }}',
      fields: ['content', 'related_articles', 'attachments'],
      includeRelations: true
    }
  },
  
  // Update Directus with RAG analytics
  feedbackLoop: {
    // Track which content is most useful
    queryLog: 'directus.rag_queries',
    
    // Update article popularity
    popularityMetric: {
      collection: 'articles',
      field: 'rag_retrieval_count',
      increment: 1
    }
  }
};

n8n + Slack for Team Knowledge

// Slack integration for team knowledge
const slackIntegration = {
  // Index Slack conversations
  indexing: {
    channels: ['#knowledge-base', '#product-discussions', '#engineering'],
    excludeBots: true,
    excludeCommands: true,
    
    // Thread context preservation
    threadContext: {
      includeParent: true,
      includeReplies: true,
      maxThreadDepth: 5
    }
  },
  
  // Slack bot for queries
  bot: {
    trigger: '@knowledgebot',
    
    // Response in thread
    responseMode: 'thread',
    
    // Include source links
    includeSources: true,
    
    // Summarize for Slack
    summarize: {
      maxLength: 3000,  // Slack message limit
      includeHighlights: true
    }
  },
  
  // Learn from reactions
  feedback: {
    thumbsUp: 'positive_feedback',
    thumbsDown: 'negative_feedback',
    
    // Auto-improve based on reactions
    retraining: 'weekly'
  }
};

Security and Privacy

Data Protection

// Security configuration for RAG systems
const securityConfig = {
  // Encryption at rest
  encryption: {
    vectors: 'aes-256-gcm',
    metadata: 'aes-256-gcm',
    backups: 'aes-256-gcm'
  },
  
  // Encryption in transit
  tls: {
    version: '1.3',
    certificates: 'letsencrypt',
    hsts: true
  },
  
  // Access controls
  rbac: {
    roles: [
      { name: 'admin', permissions: ['read', 'write', 'delete', 'configure'] },
      { name: 'editor', permissions: ['read', 'write'] },
      { name: 'viewer', permissions: ['read'] }
    ],
    
    // Row-level security on documents
    documentLevel: true
  },
  
  // Audit logging
  audit: {
    events: ['query', 'ingest', 'update', 'delete', 'access_denied'],
    retention: '2_years',
    tamperProof: true
  },
  
  // PII handling
  pii: {
    detection: 'automatic',
    redaction: 'mask',  // or 'remove', 'hash'
    entities: ['email', 'phone', 'ssn', 'credit_card', 'name'],
    
    // Don't index PII in vectors
    excludeFromEmbedding: true
  },
  
  // Data residency
  residency: {
    vectors: 'eu-west-1',  // GDPR compliant
    backups: 'eu-central-1'
  }
};

Testing and Validation

RAG Evaluation Framework

// Comprehensive RAG testing
const ragEvaluation = {
  // Test datasets
  datasets: {
    // Questions with known answers
    qaPairs: [
      {
        question: 'What is our refund policy?',
        expectedAnswer: 'We offer full refunds within 30 days',
        expectedSources: ['policies/refund.pdf']
      }
    ],
    
    // Edge cases
    edgeCases: [
      { question: 'asdfghjkl', expectedBehavior: 'graceful_fallback' },
      { question: 'What is 2+2?', expectedBehavior: 'no_hallucination' }
    ]
  },
  
  // Metrics
  metrics: {
    // Retrieval metrics
    hitRate: {
      @1: 0.75,   // Top 1 is relevant
      @5: 0.90,   // Relevant in top 5
      @10: 0.95   // Relevant in top 10
    },
    
    // Generation metrics
    bleu: 0.45,
    rouge: 0.52,
    faithfulness: 0.88,
    answerRelevance: 0.91,
    
    // Latency
    p95Latency: '< 3 seconds'
  },
  
  // Automated testing in n8n
  testWorkflow: [
    {
      node: 'Load Test Dataset',
      type: 'readBinaryFile',
      path: '/tests/rag-test-cases.json'
    },
    {
      node: 'Run Test Queries',
      type: 'httpRequest',
      url: 'https://api.company.com/rag-query',
      batchSize: 10
    },
    {
      node: 'Calculate Metrics',
      type: 'code',
      code: `
        const results = $input.all();
        const metrics = calculateMetrics(results);
        return [{ json: metrics }];
      `
    },
    {
      node: 'Compare to Thresholds',
      type: 'if',
      condition: '{{ $json.hitRate@5 >= 0.90 }}'
    },
    {
      node: 'Report Results',
      type: 'slack',
      message: 'RAG evaluation complete. Hit rate @5: {{ $json.hitRate@5 }}'
    }
  ]
};

Conclusion

Retrieval-Augmented Generation represents the most significant advancement in enterprise AI since the introduction of large language models themselves. By combining GPT-5.5's enhanced reasoning capabilities with well-architected vector databases and intelligent retrieval strategies, organizations can build knowledge-intensive automation that is accurate, verifiable, and cost-effective.

The patterns and implementations covered in this guide—from basic document ingestion to advanced multi-query retrieval, from cost optimization to production monitoring—provide a comprehensive foundation for building RAG systems that scale. As GPT-5.5 continues to roll out across platforms and vector database technologies mature, we expect RAG to become the standard architecture for enterprise AI applications.

Key Takeaways:

  1. Chunking is Critical: The way you split documents has more impact on RAG quality than any other factor. Invest in intelligent, content-aware chunking strategies.
  2. Hybrid Search Wins: Combining vector similarity with keyword matching and re-ranking consistently outperforms pure semantic search by 15-30%.
  3. GPT-5.5 Changes the Economics: With 40% token efficiency improvements and enhanced reasoning, GPT-5.5 makes production RAG more affordable and effective than ever.
  4. Observability is Non-Negotiable: Production RAG systems require comprehensive monitoring of both retrieval quality and generation quality.
  5. Start Simple, Scale Smart: Begin with basic RAG patterns and add complexity (multi-query, HyDE, self-reflection) only when simple approaches prove insufficient.

What's Next?

  • Implement the ingestion pipeline from Section 2
  • Set up monitoring using the patterns from Section 5
  • Experiment with different chunking strategies on your own documents
  • Join the n8n community to share your RAG implementations

The future of enterprise AI is not about models knowing everything—it's about models knowing how to find and use the right information at the right time. That's what RAG delivers.


Need help implementing RAG for your business? Contact Tropical Media for expert consulting on AI automation, n8n workflows, and knowledge-intensive systems.

Resources

Tags

#RAG #VectorDatabases #n8n #GPT-5.5 #AI-Agents #Knowledge-Management #Qdrant #Pinecone #Automation #Retrieval-Augmented-Generation #Enterprise-AI #Machine-Learning #Workflow-Automation #Natural-Language-Processing #Semantic-Search