Back to portfolio
Internal Architecture

Custom RAG Pipelines & AI Model Integration Suite

Retrieval-augmented generation pipelines for SaaS platforms, running open models privately to cut API cost.

The challenge

Deliver accurate, context-aware AI features on top of SaaS data without leaking data to third-party APIs or paying per-token at scale.

Architecture & solution

Engineered custom RAG pipelines with strict routing rules, served open-source models locally via Ollama (DeepSeek / Qwen), and grounded responses in pgvector vector stores. Applied the Model Context Protocol (MCP) to extend model capabilities with continuous, up-to-date system context.

Technical highlights

  • Local open-model serving with Ollama (DeepSeek / Qwen) for cost & privacy
  • pgvector vector database for semantic retrieval
  • Model Context Protocol (MCP) for live, extensible context
  • Strict data-governance rules for retrieval routing

Tech stack

  • Ollama
  • RAG
  • Model Context Protocol
  • pgvector
  • SaaS Integration