Technical Deep Dive

How InfoLens Works

A modern RAG platform built for enterprise knowledge intelligence

The Challenge

Organizations have vast amounts of knowledge locked in documents, wikis, and databases. Traditional search fails because it can't understand context or intent.

LLMs hallucinate when they don't have the right information. Vector-only search misses exact technical terms. Keyword search doesn't understand semantics.

InfoLens solves this with Hybrid Search + Agentic RAG + Multi-LLM flexibility.

Core Technology #1

Hybrid Search

The best of both worlds: keyword precision meets semantic understanding

BM25 (Keyword)

Finds exact matches

PostgreSQL Full-Text

50-100ms
Vector (Semantic)

Understands meaning

pgvector similarity

150-250ms
Reranking

Refines ordering

Cross-encoder

100-200ms

Why This Matters

  • Keyword search alone: Finds "quantum computing" but misses "QC" or "qubit systems"
  • Vector search alone: Understands "QC" but might miss exact model numbers like "GPT-4"
  • Hybrid: Gets both. RRF fusion combines rankings mathematically, reranker picks the truly relevant ones
Core Technology #2

3-Mode Intelligence System

Choose the right balance of speed, cost, and quality for each query

Mode 1: Simple RAG
3-6 seconds
For when you know the source and need a fast answer

User selects a collection. System searches once, retrieves context, sends to LLM. Predictable, fast, low token usage.

SELECT collection → Hybrid Search → Format Context → LLM → Answer
~1k tokens75% accuracyBest for: Simple Q&A
Mode 2: MCP Tools
8-15 seconds
When you don't know where the information is

LLM has access to tools (list collections, search). It autonomously decides which collections to search, can search multiple sources, and synthesizes a comprehensive answer.

LLM → list_collections() → search_in_collection(A) → search_in_collection(B) → Synthesize
~3-5k tokens82% accuracyBest for: Exploration
Mode 3: Agentic RAG
12-25 seconds
For complex questions that need iterative refinement

LangGraph workflow. Agent searches, grades results for relevance, and if poor, rewrites the query and searches again. Iterates up to 3 times until quality threshold is met.

Search → Grade(relevant?) → NO: Rewrite → Search again (max 3x) → YES: Generate
~5-8k tokens90% accuracyBest for: Complex reasoning

Real-World Example

Question: "How does our authentication work across projects?"

Simple RAG: Requires you to select "Backend Docs" collection. Fast but limited to one source.

MCP Tools: Automatically finds "Project A", "Project B", "Project C" collections, searches all three, compares results.

Agentic RAG: Same as MCP but if results are vague, rewrites to "JWT authentication implementation comparison" and searches again for better results.

Core Technology #3

Multi-LLM Architecture

Use any LLM provider. Switch instantly. No vendor lock-in.

Supported Providers

OpenAI
GPT-4, GPT-3.5-turbo
Anthropic
Claude 3 (Opus, Sonnet, Haiku)
Azure OpenAI
Enterprise deployments
Ollama / Custom
Self-hosted, Vast.ai, vLLM

How It Works

Provider configurations stored in PostgreSQL. Admins can add, test, and activate providers through the UI.

Each provider specifies its chat model and embedding model. The system automatically uses the active provider.

llm_providers table → get_active_provider() → chat_model + embeddings

Why This Matters

  • Cost optimization: Use GPT-3.5 for simple queries, GPT-4 for complex reasoning
  • Privacy: Use self-hosted Ollama for sensitive data, cloud APIs for general knowledge
  • No lock-in: Switch providers in seconds without code changes or data migration
Core Technology #4

Model Context Protocol (MCP)

Your knowledge base becomes a universal tool for any AI

What is MCP?

Anthropic's open protocol for connecting AI models to external tools and data. InfoLens exposes 10 tools via Server-Sent Events (SSE).

Claude Desktop, Cursor IDE, or any MCP-compatible client can search your knowledge base, create collections, and manage documents.

10 Exposed Tools

search_documents
list_collections
get_collection
create_collection
add_documents
delete_document
list_documents
delete_collection
multi_query
health_status

Use Case

Scenario: Using Claude Desktop

You: "Search my company docs for authentication best practices"

Claude: *calls list_collections()* → sees "Backend Docs", "Security Policies"

Claude: *calls search_documents("backend-docs", "authentication")* → gets 5 results

Claude: "Based on your backend documentation, you use JWT with RS256 signing..."

Your knowledge base is now accessible to any MCP-compatible AI tool.

Built With Modern, Proven Technologies

Open-source stack, production-ready

Frontend
Next.js 15
React 19
TypeScript
Tailwind CSS
Backend
FastAPI
LangChain 0.3+
LangGraph 0.2+
Python 3.11+
Database
PostgreSQL 16
pgvector
HNSW indexes
Full-text search
Deployment
Docker Compose
Auto migrations
Health checks
SSL ready

Security & Data Sovereignty

Your data stays on your infrastructure. Period.

Self-Hosted

Deploy on your own servers. On-premise, private cloud, or VPS. You control the infrastructure.

Standard PostgreSQL

No proprietary vector database. Standard PostgreSQL with pgvector extension. Export anytime.

Authentication

JWT tokens, bcrypt password hashing, role-based access control (user/admin).

Performance Characteristics

Search Latency
BM25 search50-100ms
Vector search150-250ms
Reranking100-200ms
Total (hybrid)300-500ms
Mode Comparison
Simple RAG3-6s (75%)
MCP Tools8-15s (82%)
Agentic RAG12-25s (90%)
Latency (Accuracy)
Document Processing
Chunk size1000
Overlap200
Embeddings768/1536
RecursiveCharacterTextSplitter

Ready to Deploy?

Self-host InfoLens on your infrastructure with Docker Compose