Technical Deep Dive

How InfoLens Works

A modern RAG platform built for enterprise knowledge intelligence

The Challenge

Organizations have vast amounts of knowledge locked in documents, wikis, and databases. Traditional search fails because it can't understand context or intent.

LLMs hallucinate when they don't have the right information. Vector-only search misses exact technical terms. Keyword search doesn't understand semantics.

InfoLens solves this with Hybrid Search + Agentic RAG + Multi-LLM flexibility.

Core Technology #1

Hybrid Search

The best of both worlds: keyword precision meets semantic understanding

BM25 (Keyword)

Finds exact matches

PostgreSQL Full-Text

50-100ms

Vector (Semantic)

Understands meaning

pgvector similarity

150-250ms

Reranking

Refines ordering

Cross-encoder

100-200ms

Why This Matters

Keyword search alone: Finds "quantum computing" but misses "QC" or "qubit systems"
Vector search alone: Understands "QC" but might miss exact model numbers like "GPT-4"
Hybrid: Gets both. RRF fusion combines rankings mathematically, reranker picks the truly relevant ones

Core Technology #2

3-Mode Intelligence System

Choose the right balance of speed, cost, and quality for each query

Mode 1: Simple RAG

3-6 seconds

For when you know the source and need a fast answer

User selects a collection. System searches once, retrieves context, sends to LLM. Predictable, fast, low token usage.

SELECT collection → Hybrid Search → Format Context → LLM → Answer

~1k tokens75% accuracyBest for: Simple Q&A

Mode 2: MCP Tools

8-15 seconds

When you don't know where the information is

LLM has access to tools (list collections, search). It autonomously decides which collections to search, can search multiple sources, and synthesizes a comprehensive answer.

LLM → list_collections() → search_in_collection(A) → search_in_collection(B) → Synthesize

~3-5k tokens82% accuracyBest for: Exploration

Mode 3: Agentic RAG

12-25 seconds

For complex questions that need iterative refinement

LangGraph workflow. Agent searches, grades results for relevance, and if poor, rewrites the query and searches again. Iterates up to 3 times until quality threshold is met.

Search → Grade(relevant?) → NO: Rewrite → Search again (max 3x) → YES: Generate

~5-8k tokens90% accuracyBest for: Complex reasoning

Real-World Example

Question: "How does our authentication work across projects?"

Simple RAG: Requires you to select "Backend Docs" collection. Fast but limited to one source.

MCP Tools: Automatically finds "Project A", "Project B", "Project C" collections, searches all three, compares results.

Agentic RAG: Same as MCP but if results are vague, rewrites to "JWT authentication implementation comparison" and searches again for better results.

Core Technology #3

Multi-LLM Architecture

Use any LLM provider. Switch instantly. No vendor lock-in.

Supported Providers

OpenAI

GPT-4, GPT-3.5-turbo

Anthropic

Claude 3 (Opus, Sonnet, Haiku)

Azure OpenAI

Enterprise deployments

Ollama / Custom

Self-hosted, Vast.ai, vLLM

How It Works

Provider configurations stored in PostgreSQL. Admins can add, test, and activate providers through the UI.

Each provider specifies its chat model and embedding model. The system automatically uses the active provider.

llm_providers table → get_active_provider() → chat_model + embeddings

Why This Matters

Cost optimization: Use GPT-3.5 for simple queries, GPT-4 for complex reasoning
Privacy: Use self-hosted Ollama for sensitive data, cloud APIs for general knowledge
No lock-in: Switch providers in seconds without code changes or data migration

Core Technology #4

Model Context Protocol (MCP)

Your knowledge base becomes a universal tool for any AI

What is MCP?

Anthropic's open protocol for connecting AI models to external tools and data. InfoLens exposes 10 tools via Server-Sent Events (SSE).

Claude Desktop, Cursor IDE, or any MCP-compatible client can search your knowledge base, create collections, and manage documents.

10 Exposed Tools

search_documents

list_collections

get_collection

create_collection

add_documents

delete_document

list_documents

delete_collection

multi_query

health_status

Use Case

Scenario: Using Claude Desktop

You: "Search my company docs for authentication best practices"

Claude: *calls list_collections()* → sees "Backend Docs", "Security Policies"

Claude: *calls search_documents("backend-docs", "authentication")* → gets 5 results

Claude: "Based on your backend documentation, you use JWT with RS256 signing..."

Your knowledge base is now accessible to any MCP-compatible AI tool.

Built With Modern, Proven Technologies

Open-source stack, production-ready

Frontend

Next.js 15

React 19

TypeScript

Tailwind CSS

Backend

FastAPI

LangChain 0.3+

LangGraph 0.2+

Python 3.11+

Database

PostgreSQL 16

pgvector

HNSW indexes

Full-text search

Deployment

Docker Compose

Auto migrations

Health checks

SSL ready

Security & Data Sovereignty

Your data stays on your infrastructure. Period.

Self-Hosted

Deploy on your own servers. On-premise, private cloud, or VPS. You control the infrastructure.

Standard PostgreSQL

No proprietary vector database. Standard PostgreSQL with pgvector extension. Export anytime.

Authentication

JWT tokens, bcrypt password hashing, role-based access control (user/admin).

Performance Characteristics

Search Latency

BM25 search50-100ms

Vector search150-250ms

Reranking100-200ms

Total (hybrid)300-500ms

Mode Comparison

Simple RAG3-6s (75%)

MCP Tools8-15s (82%)

Agentic RAG12-25s (90%)

Latency (Accuracy)

Document Processing

Chunk size1000

Overlap200

Embeddings768/1536

RecursiveCharacterTextSplitter

Ready to Deploy?

Self-host InfoLens on your infrastructure with Docker Compose