Production-ready Retrieval-Augmented Generation system with hybrid search, multi-cloud LLM orchestration, and 100% on-premise deployment capability for sensitive data processing.
Démonstration complète du système RAG avec ingestion, retrieval hybride, et comparaison Local/Cloud | Début interface user @1min37
In the era of Large Language Models (LLMs), organizations face a critical challenge: leveraging AI capabilities while maintaining data sovereignty and confidentiality. This project addresses this need by developing an advanced Retrieval-Augmented Generation (RAG) system that can operate entirely on-premise or integrate with cloud providers based on security requirements.
Figure 1 - High-level architecture of the RAG system
The main challenge was to design a system that balances performance, cost, and privacy while maintaining production-grade reliability and scalability.
The retrieval system combines dense vector search (mxbai-embed-large), sparse BM25, and Reciprocal Rank Fusion (RRF) to achieve optimal accuracy across diverse query types.
Figure 2 - Hybrid retrieval pipeline combining dense, sparse, and fusion techniques
The system provides flexible LLM integration with real-time benchmarking capabilities.
Figure 3 - Real-time comparison between local and cloud LLM responses
A production-grade ETL pipeline processes documents through parsing, chunking, quality filtering, and GPU-accelerated embedding generation.
Figure 4 - Automated document ingestion pipeline architecture
PostgreSQL 15+ with PGVector extension, Docker containerization, and Streamlit interface.
Streamlit-based interface with chat interaction, source citations, side-by-side local/cloud comparison, and document upload capabilities.
Figure 7 - Interactive Streamlit interface
This project demonstrates a production-ready RAG system combining 100% on-premise capability with multi-cloud flexibility. Hybrid retrieval (dense + sparse + RRF) delivers superior accuracy, while SQL-native fusion provides 10x performance gains over Python implementations.
Future enhancements include multi-agent integration, graph-based retrieval, multimodal support, and enterprise scalability for millions of chunks.
Status: Production-ready, fully deployable
Contact: For repository access or technical inquiries, contact Martin LE CORRE
Documentation: 📄 View detailed README