Back to portfolio
Internal Architecture
Custom RAG Pipelines & AI Model Integration Suite
Retrieval-augmented generation pipelines for SaaS platforms, running open models privately to cut API cost.
The challenge
Deliver accurate, context-aware AI features on top of SaaS data without leaking data to third-party APIs or paying per-token at scale.
Architecture & solution
Engineered custom RAG pipelines with strict routing rules, served open-source models locally via Ollama (DeepSeek / Qwen), and grounded responses in pgvector vector stores. Applied the Model Context Protocol (MCP) to extend model capabilities with continuous, up-to-date system context.
Technical highlights
- Local open-model serving with Ollama (DeepSeek / Qwen) for cost & privacy
- pgvector vector database for semantic retrieval
- Model Context Protocol (MCP) for live, extensible context
- Strict data-governance rules for retrieval routing
Tech stack
- Ollama
- RAG
- Model Context Protocol
- pgvector
- SaaS Integration