Open Source · Self-Hosted · Private by Design

Turn Your File Storage into a
Private AI Knowledge Base

ZettaBrain brings AI-powered document search to your own infrastructure — no cloud, no API keys, no data leaving your server. Start with a single user or deploy for your whole organisation.

PowerShell — run as Administrator

No API keys required

No data leaves your server

Works with existing NFS / SMB storage

Runs on CPU or GPU

Backed by:

Inception Partner

Activate Program

for Startups

Startups Program

Startup Program

Two Products

Private AI for Individuals
and Organisations

ZettaBrain RAG for individuals and small teams. ZettaBrain Teams for organisations that need per-team isolation, Active Directory SSO, and audit logging.

Open Source · Self-Hosted

ZettaBrain RAG

Ask questions in plain English, get sourced answers drawn from your own files — running entirely on your own server. Every response shows the exact document chunks it was drawn from so you can verify the source.

Ingest PDF, DOCX, TXT, and Markdown files
Five-stage hybrid retrieval (keyword + semantic + re-ranking)
Secure HTTPS web interface and interactive CLI
Works with local disk, NFS, SMB, and S3-compatible storage
Auto-detects NVIDIA, AMD, and Apple Silicon GPUs
Powered by Ollama — llama3, mistral, qwen run locally
No API keys, no cloud accounts, no egress costs
One-line install — up and running the same day

curl -fsSL https://zettabrain.app/install.sh | sudo bash

Read the Docs →

Commercial · Self-Hosted · Multi-Tenant

ZettaBrain Teams

Multi-tenant RAG server for organisations. Each team gets its own isolated document library with strict data boundaries — combined with Model Governance, Hybrid Model Support, Active Directory SSO, cryptographically verifiable answers, and a full admin interface.

Per-team document isolation (vector store + BM25)
Model Governance — manager-request, admin-approve workflow for model stacks
Hybrid Models — mix local Ollama with OpenAI & Claude per team
Active Directory / LDAP single sign-on
Hybrid retrieval — semantic + keyword + FlashRank re-ranking
ZettaBrain Verified — Ed25519 cryptographic answer provenance
Tamper-evident audit log with one-click signature verification
Offline Ed25519 licensing — Starter, Business, Enterprise plans
90-day free trial — no credit card required

curl -fsSL https://zettabrain.app/install-teams.sh | sudo bash

Read the Docs →

Retrieval Architecture

How ZettaBrain Answers
Your Questions

Every query passes through a five-stage pipeline that combines keyword and semantic search before sending only the most relevant context to the local LLM.

01

📂

Document Ingestion

PDF, DOCX, TXT, MD from local disk, NFS, SMB, or S3

02

✂️

Adaptive Chunking

Chunk size tuned per document type and text density

03

🔢

Local Embedding

nomic-embed-text runs via Ollama — no cloud calls

04

🔍

Hybrid Search

BM25 keyword + MMR semantic, merged and deduplicated

05

🏅

Re-Ranking + LLM

FlashRank cross-encoder picks best chunks before the LLM responds

BM25 KEYWORD SEARCH

Exact Term Matching

Catches precise phrases, codes, and names that pure vector search can miss — especially useful in legal and compliance documents.

MMR SEMANTIC SEARCH

Diversity + Relevance

Maximum Marginal Relevance via ChromaDB retrieves chunks that are topically relevant without being repetitive.

CROSS-ENCODER RE-RANKING

FlashRank Scoring

ms-marco-MiniLM-L-12-v2 scores every candidate chunk against the actual query. Only the best context reaches the model.

TRANSPARENT SOURCING

See Where Answers Come From

Every answer shows the document chunks it was drawn from. Type sources in the CLI at any time to inspect them.

Who It's For

Built for Teams That Can't
Send Files to the Cloud

ZettaBrain is useful for any organization that manages sensitive documents and needs a way to search and query them — without routing data through a third-party API.

⚖️

Legal & Compliance

Law firms and in-house legal teams work with confidential contracts, case files, and regulatory documents. ZettaBrain lets staff ask plain-language questions across all of it — without sending a single file to a cloud AI service.

Contract reviewCase researchRegulatory Q&A

🏥

Healthcare

Clinical teams and administrators manage sensitive patient documentation, clinical guidelines, and compliance records that cannot be processed by external AI services. ZettaBrain runs entirely inside your own infrastructure.

Clinical protocolsPolicy manualsStaff knowledge base

🏦

Financial Services

Banks, asset managers, and insurers need fast access to regulatory filings, internal policies, and client documentation — and they need that access to stay inside their security perimeter.

Regulatory filingsRisk documentationAudit trails

🏛️

Government & Defense

Government agencies and defense organizations often operate in air-gapped or restricted environments where cloud AI tools are simply not an option. ZettaBrain can run with no outbound internet connectivity at all.

Air-gapped deployInternal policy searchOffline operation

🔬

Research & Education

Research institutions accumulate large bodies of papers, lab reports, and internal documentation. ZettaBrain helps researchers find answers across their own corpus without depending on generic public AI tools.

Literature searchLab documentationGrant documents

🏢

Enterprise IT & Operations

IT and operations teams maintain large volumes of runbooks, architecture docs, and vendor manuals on shared drives. ZettaBrain turns that NFS or SMB share into a knowledge base your team can query in plain English.

RunbooksArchitecture docsIncident response

What Makes It Different

Private by Design,
Practical by Default

Most document AI tools require cloud APIs and managed services. ZettaBrain is built around the opposite assumption.

🔒

No Cloud Dependency

The embedding model, the LLM, and the vector store all run on your machine. No OpenAI keys, no Anthropic keys, no egress charges — and no risk of your documents appearing in someone else's training data.

🗄️

Works With Storage You Already Have

ZettaBrain connects directly to your NFS mounts or SMB shares. Your files stay exactly where they are — there's no migration, no upload step, no new storage platform to manage.

📋

Answers Come With Sources

Every response includes the specific document chunks it was based on. Users can check the original file — important for legal, compliance, and any context where accuracy needs to be verifiable.

⚙️

Straightforward to Install

One-line installer handles OS detection, Python, Ollama, and model download on Linux, macOS, and Windows. Most teams are up and running the same day.

💻

Flexible Hardware Requirements

Runs on CPU-only hardware (a smaller model is recommended), and accelerates automatically when NVIDIA, AMD, or Apple Silicon GPUs are detected. The wizard recommends the right model for what you have.

Platform Compatibility

Runs Where Your
Infrastructure Lives

One installer — detects your OS and package manager automatically. No manual steps for each platform.

Ubuntu

Ubuntu 22.04+ — full GPU support, systemd service, apt-based install

Red Hat Linux

RHEL 8 / 9 / 10 — DNF / YUM package manager, systemd service

macOS

Apple Silicon (M1/M2/M3) and Intel — Metal GPU acceleration

Windows

Windows 10/11 and Server 2016+ — PowerShell + winget

🖥️

GPU Acceleration

NVIDIA CUDA, AMD ROCm, Apple Silicon Metal — auto-detected

Storage: Local disk · NFS · SMB/Samba · AWS S3 · MinIO · Ceph

Our Team

The People Behind ZettaBrain

Built by engineers with deep roots in enterprise storage and AI infrastructure at Amazon Web Services, NetApp, and Hewlett-Packard.

Olajide Shobowale

Founder & CEO

A globally recognized storage systems engineer whose career spans Amazon Web Services, NetApp, and Hewlett-Packard. Olajide has built breakthrough storage technologies serving millions of users worldwide — from intelligent data tagging solutions to hybrid storage optimizations that delivered significant cost reductions and efficiency gains for enterprise customers.

Author of "Intelligent Data Management And Security" and a recognized thought leader in major cloud storage technical communities. Multiple industry awards for technical excellence and leadership. Advanced degrees in Computing and Forensics Information Technology.

Enterprise Storage Cloud Infrastructure AWS NetApp Fortune 500

Tife Obideyi

Co-Founder & CTO

Bridges the gap between complex storage infrastructure and enterprise usability, translating ZettaBrain's sophisticated hybrid storage technology into intuitive interfaces that empower teams to maximize their AI and ML workloads. With extensive experience building scalable frontend platforms across high-traffic enterprise environments, Tife specializes in creating performance-driven dashboards, monitoring consoles, and developer tools that make complex backend systems accessible and actionable.

At ZettaBrain, Tife architects the user-facing layer of our storage solutions — designing real-time performance monitoring dashboards, resource allocation interfaces, and management consoles that help enterprises visualize storage efficiency and track AI workload optimization. AWS Solutions Architecture and Cybersecurity certified, with hands-on experience in secure system design and PCI DSS compliance, bringing a security-conscious approach to building interfaces for enterprise-grade infrastructure.

React / TypeScript Frontend Architecture GraphQL / API Integration AWS Solutions Architect Cybersecurity

James Wong

Founding Developer

Full-stack software engineer with 10+ years of experience building scalable, secure, and innovative solutions across fintech, gaming, edtech, and enterprise. At ZettaBrain, James leverages his expertise in cloud architecture and AI/ML integration to build robust, high-performance systems that power our next-generation storage platform.

AWS and GCP Certified Professional Cloud Architect with deep expertise in modern web technologies, LLM integration, and blockchain systems. Proven track record delivering 20+ projects across diverse tech stacks, specializing in React, Next.js, Node.js, NestJS, and cloud-native architectures.

React / Next.js / TypeScript Node.js / NestJS AWS & GCP Architect AI/ML Integration Blockchain

Bolarinwa Shobowale

Co-Founder & Head of Business Development

Brings over a decade of commercial experience across enterprise technology sales, brand marketing, and procurement spanning the UK and Nigeria. Currently a Product Advisor at Microsoft UK, where he leads needs-led demonstrations of Microsoft devices and cloud services and guides customers through digital adoption. Earlier roles at Samsung Electronics UK and Russell & Bromley sharpened his ability to translate complex products into clear, customer-led value — directly relevant to taking ZettaBrain to enterprise teams.

At Brandlife Nigeria he planned and executed go-to-market campaigns for HP, Intel, and Microsoft across in-store and digital channels. As Procurement Manager at DesignHQ he ran end-to-end vendor relationships and negotiated cost savings of up to 10% on supplier contracts. Holds an MSc in International Business from the University of Chester and a BSc in Business Administration from Redeemer's University.

Enterprise Sales Go-to-Market Microsoft Brand Marketing MSc International Business

Backed By

Supported by Industry Leaders

ZettaBrain is recognized by leading technology startup programs, giving us access to world-class infrastructure, expertise, and go-to-market support.

NVIDIA Inception

Member of NVIDIA's Inception Program for AI startups, providing deep learning resources and ecosystem access.

AWS Activate

AWS Activate member — cloud credits, technical support, training, and go-to-market collaboration from Amazon Web Services.

Google for Startups

Google Cloud for Startups member with cloud infrastructure credits and access to Google's startup support network.

GitHub Startups

GitHub Startups Program member with access to GitHub's developer tooling, collaboration infrastructure, and community.

Datadog Startups

Datadog Startup Program member with monitoring, observability, and infrastructure analytics support for our platform.

Ready to Get Started?

Install ZettaBrain on your own server and start querying your documents today, or reach out and we'll walk you through it.

ZettaBrain RAG

Private Document AI

ZettaBrain lets you have natural language conversations with your own documents — running entirely on your own infrastructure, with no API keys and no data sent to any cloud service.

What It Does

It connects to your existing file storage (local disk, NFS, SMB, or S3-compatible), indexes your documents locally, and lets you query them in plain English through a web interface or CLI. Every answer includes the source chunks it came from.

Supported Platforms

Platform	Installer	Notes
Ubuntu	`install.sh`	systemd service, full GPU support
Red Hat Linux	`install.sh`	DNF / YUM, RHEL 8 / 9 / 10
macOS (Apple Silicon / Intel)	`install.sh`	Homebrew, Metal GPU
Windows 10/11 / Server 2016+	`install.ps1` / `install.cmd`	PowerShell, winget

Quick Links

Quick Install (per-OS commands)
Sample test data — try ZettaBrain without your own documents
Retrieval pipeline
Storage sources
CLI commands
GitHub repository

Quick Install

Select your operating system — the installer handles Python, Ollama, and model download automatically.

curl -fsSL https://zettabrain.app/install.sh | sudo bash

curl -fsSL https://zettabrain.app/install.sh | sudo bash

Does not require sudo — Homebrew refuses to run as root

curl -fsSL https://zettabrain.app/install.sh | bash

Option 1 — download install.cmd and run as Administrator

# Download: https://zettabrain.app/install.cmd
# Right-click → Run as administrator

Option 2 — PowerShell (run as Administrator)

[Net.ServicePointManager]::SecurityProtocol = [Net.SecurityProtocolType]::Tls12
irm https://zettabrain.app/install.ps1 | iex

What the installer does

Detects your OS and package manager (apt / dnf / yum / brew / winget)
Installs Python 3.9+ and system dependencies including zstd
Installs zettabrain-rag via pipx (isolated environment)
Installs and starts Ollama as a background service
Downloads the nomic-embed-text embedding model (~275 MB)
On NVIDIA hardware: installs CUDA runtime and drivers

Install via pip / pipx

pip install zettabrain-rag
# or isolated:
pipx install zettabrain-rag
zettabrain --version

After install

sudo zettabrain-setup       # configure storage + model
zettabrain-server           # launch web UI at :7860
zettabrain-chat             # or use CLI chat

First-Time Setup

After installation, the setup wizard configures storage, selects a model based on your hardware, and enables HTTPS.

1. Run the Wizard

sudo zettabrain-setup

2. Launch the Web Interface

zettabrain-server

Open https://local.zettabrain.app:7860 — trusted HTTPS, fully private.

3. Or Use the CLI

zettabrain-chat

Type any question to query your documents. Type sources to see which chunks were used. Type quit to exit.

Extension	Format	Notes
`.pdf`	PDF	Text-layer PDFs. Scanned PDFs need OCR pre-processing.
`.txt`	Plain Text	UTF-8 encoding. Default chunk size 800.
`.md`	Markdown	Headers preserved as chunk boundaries.
`.docx`	Word Document	Paragraph structure preserved. Tables extracted as text.

Command	Description
`sudo zettabrain-setup`	Storage wizard, model selection, TLS cert
`zettabrain-server`	Launch web GUI at port 7860
`zettabrain-chat`	Interactive CLI chat
`zettabrain-chat --rebuild`	Rebuild vector store then start
`zettabrain-chat --debug`	Show retrieved chunks with each response
`zettabrain-ingest`	Ingest documents
`zettabrain-ingest --folder /path`	Ingest a specific folder
`zettabrain-ingest --file /path/doc.pdf`	Ingest a single file
`zettabrain-ingest --stats`	Show vector store contents
`zettabrain-ingest --clear`	Wipe the vector store
`zettabrain-status`	Version, paths, certs, store stats
`sudo zettabrain-storage add`	Add a storage source
`zettabrain-storage list`	List configured sources

Type	Action
Any question	Query your documents
`sources`	Show which chunks were used
`timing`	Show retrieve/generate time per query
`debug on / off`	Toggle chunk display
`quit`	Exit

GPU & Model Selection

Ollama auto-detects your GPU. No configuration is needed beyond having the correct drivers installed.

Supported Hardware

NVIDIA — CUDA 12+
AMD — ROCm 5.7+
Apple Silicon — Metal (M1/M2/M3, built-in)
CPU-only — Works on any x86; smaller models recommended

Model Selection Wizard

Hardware detected: NVIDIA RTX 3080 (10 GB VRAM)
Recommended: llama3.1:8b

  1) llama3.2:3b    ~2 GB   fastest, good for quick Q&A
  2) llama3.1:8b    ~5 GB   balanced        ← default
  3) mistral:7b     ~4 GB   strong reasoning
  4) llama3.1:13b   ~8 GB   better, needs 12 GB+
  5) qwen2.5:14b    ~9 GB   excellent, needs 16 GB+
  6) Custom

Approximate Performance

Hardware	Model	Tokens/sec	~Response time
4-core CPU, 8 GB RAM	llama3.2:3b	8–15	20–40 s
8-core CPU, 16 GB RAM	llama3.1:8b	5–12	25–60 s
NVIDIA RTX 3060 (8 GB)	llama3.1:8b	60–90	3–5 s
NVIDIA RTX 3080 (10 GB)	llama3.1:8b	80–120	2–4 s
Apple M2 (16 GB)	llama3.1:8b	30–50	6–10 s

Variable	Default	Description
`ZETTABRAIN_DOCS`	`/opt/zettabrain/data`	Documents folder
`ZETTABRAIN_CHROMA`	`/opt/zettabrain/src/zettabrain_vectorstore`	ChromaDB path
`ZETTABRAIN_LLM_MODEL`	`llama3.1:8b`	LLM model name
`ZETTABRAIN_EMBED_MODEL`	`nomic-embed-text`	Embedding model
`ZETTABRAIN_CHUNK_SIZE`	`1000 / 800`	Chunk size (adaptive)
`ZETTABRAIN_CHUNK_OVERLAP`	`150 / 100`	Chunk overlap (adaptive)
`OLLAMA_HOST`	`http://localhost:11434`	Ollama API endpoint

	Minimum	Recommended
RAM	8 GB	16 GB
CPU	4 cores / 2.5 GHz	8 cores / 3.0 GHz
Disk	20 GB free	50 GB free
OS	Ubuntu 22.04 / Red Hat 8 / macOS 13 / Windows 10	Ubuntu 22.04 LTS
Python	3.9	3.11+

Storage Sources

Local disk — Default. Any local path.
NFS — Network File System mounts.
SMB — Windows / Samba shares.
Object Storage — S3-compatible: AWS S3, MinIO, Ceph.

sudo zettabrain-storage add   # add a new source
zettabrain-storage list       # list configured sources

Diagnostics

zettabrain-status
python3 /opt/zettabrain/src/01_chromadb_setup.py
python3 /opt/zettabrain/src/02_embeddings_test.py
curl http://localhost:11434
ollama list
journalctl -u zettabrain -f

Uninstall

Linux / macOS

pipx uninstall zettabrain-rag
sudo rm -rf /opt/zettabrain
sudo systemctl disable --now zettabrain 2>/dev/null || true

Windows

pipx uninstall zettabrain-rag
Remove-Item -Recurse -Force "$env:LOCALAPPDATA\ZettaBrain"

Try It Out

Sample Test Data

If you don't have your own documents handy, you can use our pre-built test corpora to evaluate ZettaBrain RAG against realistic enterprise content. Each bundle contains ten .docx policy documents from a fictional organization, paired with twenty test prompts you can paste straight into the chat.

Download the Document Sets

Industry	Organization	Contents	Download
Financial Services	Apex Financial Group (fictional)	10 policy docs: trading, AML, KYC, benefits, risk, expense, insider trading, IT security, onboarding	financial.zip (~89 KB)
Healthcare	Riverside Medical Center (fictional)	10 policy docs: HIPAA, medication, emergency response, infection control, credentialing, telemedicine	healthcare.zip (~90 KB)

Test Prompts Guide

Each bundle comes with twenty industry-specific prompts plus cross-document and adversarial prompts (questions ZettaBrain should refuse to answer because the information isn't in your documents).

Download the full Test Prompts Guide →

How to Use

1. Download and unzip

Ubuntu / Debian: install unzip first if needed — sudo apt install -y unzip

mkdir -p ~/zettabrain-samples && cd ~/zettabrain-samples
curl -fsSLO https://www.zettabrain.io/sample-data/zettabrain-financial-test-docs.zip
curl -fsSLO https://www.zettabrain.io/sample-data/zettabrain-healthcare-test-docs.zip
unzip zettabrain-financial-test-docs.zip  -d financial
unzip zettabrain-healthcare-test-docs.zip -d healthcare

mkdir $HOME\zettabrain-samples; cd $HOME\zettabrain-samples
irm https://www.zettabrain.io/sample-data/zettabrain-financial-test-docs.zip  -OutFile financial.zip
irm https://www.zettabrain.io/sample-data/zettabrain-healthcare-test-docs.zip -OutFile healthcare.zip
Expand-Archive financial.zip  -DestinationPath financial
Expand-Archive healthcare.zip -DestinationPath healthcare

2. Ingest one industry into ZettaBrain

zettabrain-ingest --folder ~/zettabrain-samples/financial --rebuild

3. Ask the test prompts

zettabrain-chat
> What is the pre-clearance process for personal securities trades?
> sources       # see the exact chunks the answer was drawn from

Or open the web UI at https://local.zettabrain.app:7860 and paste prompts straight in.

4. Switch industries

To test the healthcare set, re-ingest with --rebuild pointed at the other folder. Each --rebuild swaps the active corpus.

zettabrain-ingest --folder ~/zettabrain-samples/healthcare --rebuild

What to Look For

Accuracy — every answer should be factually grounded in the source .docx chunks shown by sources.
Grounding — the cited document should match the topic of the question.
Refusal — adversarial prompts in the guide (e.g. "What is the current stock price of Apex Financial Group?") should produce a clear "not in your documents" response, not a guess.

Use Your Own Documents

When you're ready, drop your own .pdf, .docx, .txt, or .md files into any folder and ingest them the same way — see Document Formats and Storage Sources.

ZettaBrain Teams

Multi-Tenant Private Document AI

ZettaBrain Teams is a self-hosted, multi-tenant RAG server. Each team in your organisation gets its own isolated document library, users authenticate via Active Directory, and every answer is cryptographically signed — all running on your own infrastructure with no cloud, no API keys, and no data egress.

What It Does

ZettaBrain Teams ingests documents per team and lets each team query their own documents through a shared web interface. The admin panel manages users, teams, licensing, and AD integration. Every answer is signed with an Ed25519 key and logged to a tamper-evident audit trail that can be verified at any time.

Key Features

Feature	Detail
Per-team isolation	Separate ChromaDB collection and BM25 index per team — no cross-boundary leakage
Model Governance	Manager-request, admin-approve workflow for model stack assignments — central control over keys, cost, and compliance with delegated autonomy per team
Hybrid Model Support	Mix local Ollama models with frontier APIs (OpenAI, Claude) per team — keep embeddings local while routing generation to the cloud, or run fully offline
Active Directory SSO	LDAP bind-search-rebind with optional group membership enforcement
Hybrid retrieval	MMR vector search + per-team BM25 + FlashRank cross-encoder re-ranking
ZettaBrain Verified	Ed25519 cryptographic signature over every answer — query hash, chunk hashes, answer hash stored in the audit log
Tamper-evident audit log	Every query logged with provenance signature — one-click verification in the admin panel, downloadable verification report
Out-of-scope detection	Answers outside a team's document library are flagged with zero confidence and no sources
Offline Ed25519 licensing	Starter / Business / Enterprise plans — license file works air-gapped, no license server required
90-day free trial	Full functionality from day one — no credit card, no feature limits during trial
Admin panel	Dashboard, users, teams, audit log, license management, system config, AD settings
REST API	FastAPI — all admin and chat operations available as JSON endpoints

Quick Install

One-line installer for ZettaBrain Teams. Downloads and configures everything automatically — Python, pipx, zettabrain-teams, Ollama, and embedding models.

One command installs everything — runs as root on Linux:

curl -fsSL https://zettabrain.app/install-teams.sh | sudo bash

After installation, start the server:

zettabrain-teams             # starts at http://0.0.0.0:7861

One command installs everything:

curl -fsSL https://zettabrain.app/install-teams.sh | sudo bash

After installation:

zettabrain-teams

One command installs everything (no sudo on macOS):

curl -fsSL https://zettabrain.app/install-teams.sh | bash

After installation:

zettabrain-teams

Requires Python 3.9+ from python.org. Manual install via pipx:

pipx install zettabrain-teams
zettabrain-teams             # starts at http://0.0.0.0:7861

What the installer does

Detects your OS and installs system dependencies (Python 3.9+, pipx)
Installs zettabrain-teams via pipx with automatic PATH configuration
Installs and starts Ollama as a background service
Downloads the nomic-embed-text embedding model (~275 MB)
Runs the setup wizard for storage paths and LLM model selection
Creates /opt/zettabrain-teams/ with config and data directories
All commands available immediately — no manual PATH setup needed

After install

zettabrain-teams              # start server at :7861
# Open http://your-server:7861 — log in with admin / admin

First-Time Setup

After installing the package, run the setup wizard as root to configure Ollama and models, then complete admin setup in the web interface.

1. Run the Setup Wizard

sudo zettabrain-teams-setup
# Options:
sudo zettabrain-teams-setup --port 7861 --llm llama3.1:8b --embed nomic-embed-text
sudo zettabrain-teams-setup --no-systemd   # skip service registration

2. Start the Server

zettabrain-teams              # starts at :7861
zettabrain-teams --port 8080  # custom port
zettabrain-teams --reload     # dev mode

Open http://your-server:7861 in a browser.

3. First Login & Admin Setup

Log in with admin / P@ssword! — you will be prompted to change the password on first login
Go to Admin → Teams to create your first team and assign a document folder
Add users to the team, or configure Active Directory under Admin → Settings → LDAP
Trigger a document ingest from the team page to build the vector store

Role	Access
`admin`	Full admin panel: users, teams, settings, audit log
`user`	Chat interface for their own teams only

Component	Scope
Generation Model	Local (Ollama) or API (OpenAI, Claude)
Embedding Model	Local (Ollama) or API (OpenAI)
API Keys	Stored centrally — teams never see them
Cost Tracking	Per-team token usage and estimated spend (coming soon)

Hybrid Model Support

Local models keep your data private. Frontier models give the sharpest answers. Most RAG tools make you pick one. ZettaBrain Teams doesn't.

Mix Local and Cloud

Every team runs its own hybrid stack: mix Ollama running locally, OpenAI, and Claude however the workload demands.

Independent Model Selection

The part people miss? Your generation model and your embedding model are configured separately. So you can keep embeddings fully local, meaning your documents never leave the box, while routing generation to Claude or GPT for answer quality. Or run one team entirely offline while another uses a frontier model.

Model Type	Local Options	Cloud Options
Generation (LLM)	llama3.1:8b, mistral:7b, qwen2.5:14b (via Ollama)	gpt-4, gpt-4-turbo, claude-sonnet-4.6, claude-opus-4.7
Embedding	nomic-embed-text, all-minilm (via Ollama)	text-embedding-3-small, text-embedding-3-large (OpenAI)

Privacy Modes

Mode	Configuration	Data Locality
Fully Local	Ollama generation + Ollama embedding	Nothing leaves your infrastructure
Hybrid (Privacy-First)	Cloud generation + Local embedding	Documents stay local, only generated answers hit the API
Hybrid (Performance-First)	Cloud generation + Cloud embedding	Faster embedding, lower latency — documents are embedded via API
Fully Cloud	Cloud generation + Cloud embedding	Maximum performance, full API dependency

Same Platform, Different Tradeoffs

You stop choosing between privacy and capability, and start tuning the balance. Different teams, different needs — all on the same infrastructure.

Example Configurations

Finance Team — Fully local (llama3.1:8b + nomic-embed-text) for compliance and air-gapped operation
Research Team — Hybrid privacy-first (claude-sonnet-4.6 + nomic-embed-text) for best answers while keeping source documents local
Product Team — Fully cloud (gpt-4 + text-embedding-3-large) for maximum speed and answer quality

Installation

pipx install zettabrain-teams

Configure hybrid models via Admin → Settings → Models. Add your OpenAI and Anthropic API keys, then teams can request access through the Model Governance workflow.

Field	Example
LDAP URL	`ldap://dc.acme.com` or `ldaps://dc.acme.com`
Bind DN	`CN=svc-zettabrain,CN=Users,DC=acme,DC=com`
Bind Password	Service account password
User Search Base	`DC=acme,DC=com`
User Search Filter	`(&(objectClass=user)(sAMAccountName={username}))`
Required AD Group DN	`CN=ZettaBrain-Users,OU=Groups,DC=acme,DC=com` (optional)

Field	Description
timestamp	ISO 8601 UTC
username	Resolved from user ID
team_name	Resolved from team ID
query	Full query text
response_preview	First 200 chars of answer
confidence	0.0–1.0 reranker score (0.0 for out-of-scope answers)
duration_ms	End-to-end latency in ms
chunks_used	Number of context chunks passed to LLM
model	Ollama model that generated the response
query_hash	SHA-256 hex of the query string
chunk_hashes	JSON array of SHA-256 hashes — one per context chunk
answer_hash	SHA-256 hex of the full answer text
provenance_sig	Ed25519 hex signature over the canonical answer bundle

ZettaBrain Verified

Every answer produced by ZettaBrain Teams is cryptographically signed with an Ed25519 key held by the server. This makes the entire answer bundle — query, context chunks, and response — tamper-evident and independently verifiable.

How It Works

When the LLM returns an answer, ZettaBrain computes a SHA-256 hash of the query, of each context chunk, and of the answer text
These hashes are assembled into a canonical JSON payload and signed with the server's Ed25519 private key
The signature and all hashes are stored alongside the audit log entry in the database
Any admin can click 🛡 Verify on any log row to re-derive the canonical payload and confirm the signature still matches — proving the stored answer has not been altered

What Is Signed

Field	Value
`query_hash`	SHA-256 of the user's question
`chunk_hashes`	Sorted array of SHA-256 hashes — one per context chunk retrieved
`answer_hash`	SHA-256 of the full LLM response
`model`	Ollama model identifier
`team_id`	Team scope of the query

Out-of-Band Verification

The server's Ed25519 public key never changes and is exposed at:

GET /api/admin/keys/public
# Returns: { "public_key_hex": "45262b...", "algorithm": "Ed25519" }

A compliance team or external auditor can retrieve this key once, then verify any downloaded report independently using standard Ed25519 libraries — no access to the ZettaBrain server required.

Out-of-Scope Detection

When a query falls outside a team's document library, ZettaBrain returns a fixed message ("This question is outside the scope of this team's document library.") with a confidence of 0.0 and an empty sources list. The provenance signature is still computed and stored over this clean response, so the absence of an answer is itself auditable.

Licensing & Plans

ZettaBrain Teams ships with a built-in 90-day free trial. After the trial, a license file activates the server without any internet connection — suitable for air-gapped environments.

Trial Period

90 days of full functionality from the first server start — no credit card, no feature limits
A warning banner appears in the admin dashboard when fewer than 14 days remain
The server exits cleanly on expiry with a message pointing to sales@zettabrain.io

License Plans

Plan	Max Users	Max Teams	Features
Starter	10	3	Provenance signing
Business	50	20	LDAP, provenance signing, audit export
Enterprise	Unlimited	Unlimited	LDAP, provenance signing, audit export, SSO

Contact sales@zettabrain.io to purchase. You receive a .lic file by email.

Activating a License

Two methods — both work while the server is running, no restart needed:

Option A — Admin UI

Go to Admin → License
Paste the license key string or click Upload .lic file
Click Activate License — the status panel updates immediately

Option B — File on disk (air-gapped)

cp your_license.lic /opt/zettabrain-teams/data/license.lic
# Restart the server to pick it up
systemctl restart zettabrain-teams

License Status

The Admin → License page shows the current state, plan, customer name, expiry date, seats used vs. allowed, and enabled features. The dashboard also displays a days-remaining card during trial and when a license is expiring within 30 days.

Seat Enforcement

User and team creation are blocked at the plan limit with a clear error message. To add more seats, upload an upgraded license — no restart required.

CLI Commands

Command	Description
`sudo zettabrain-teams-setup`	First-time wizard — Ollama, models, systemd
`sudo zettabrain-teams-setup --port 7861`	Custom port
`sudo zettabrain-teams-setup --llm llama3.1:8b`	Specify LLM model
`sudo zettabrain-teams-setup --no-systemd`	Skip systemd registration
`zettabrain-teams`	Start web server at :7861
`zettabrain-teams --port 8080`	Override port
`zettabrain-teams --host 127.0.0.1`	Bind to specific interface
`zettabrain-teams --reload`	Dev mode with auto-restart

Document ingestion

Triggered from the admin web panel (Admin → Teams → Ingest), or via the REST API:

curl -X POST http://localhost:7861/api/teams/{team_id}/ingest \
  -H "Authorization: Bearer <token>"

Variable	Default	Description
`ZBT_PORT`	`7861`	Server port
`ZBT_CHROMA_DIR`	`/opt/zettabrain-teams/chromadb`	Root for per-team ChromaDB collections
`ZBT_DB_PATH`	`/opt/zettabrain-teams/teams.db`	SQLite database (users, teams, settings)
`ZETTABRAIN_LLM_MODEL`	`llama3.1:8b`	Ollama LLM model
`ZETTABRAIN_EMBED_MODEL`	`nomic-embed-text`	Ollama embedding model
`OLLAMA_HOST`	`http://localhost:11434`	Ollama API endpoint
`ZBT_SECRET_KEY`	auto-generated	JWT signing secret — set explicitly in production
`ZBT_TLS_CERT`	—	Path to TLS certificate (enables HTTPS)
`ZBT_TLS_KEY`	—	Path to TLS private key

	Minimum	Recommended
RAM	8 GB	16 GB+
CPU	4 cores / 2.5 GHz	8 cores / 3.0 GHz
Disk	20 GB free	50 GB free
OS	Ubuntu 22.04 / Red Hat 8 / macOS 13	Ubuntu 22.04 LTS
Python	3.9	3.11+

Diagnostics

systemctl status zettabrain-teams
journalctl -u zettabrain-teams -f
curl http://localhost:7861/api/health
curl http://localhost:11434
ollama list

Uninstall

Linux / macOS

sudo systemctl disable --now zettabrain-teams 2>/dev/null || true
pipx uninstall zettabrain-teams
sudo rm -rf /opt/zettabrain-teams

Windows

pip uninstall zettabrain-teams

Get in Touch

Request a
Briefing or Call

Tell us about your team's document workflows and what you're trying to solve. We're happy to walk you through ZettaBrain and discuss whether it fits your setup.

⚡

Quick Response

We aim to reply within one business day.

📧

Email Us Directly

info@zettabrain.io

💻

Open Source

ZettaBrain RAG is free and open source. View on GitHub →

🖥️

NVIDIA Inception Partner

ZettaBrain is a member of the NVIDIA Inception program.

Send a Message

Fields marked * are required. Your information is never shared with third parties.

First Name *

Last Name *

Work Email *

Phone

Organization *

Job Title

Industry

How can we help?

Message *

I'd like to schedule a live walkthrough of ZettaBrain

By submitting, your message will be sent to info@zettabrain.io.

✅

Message Sent

Your message has been directed to info@zettabrain.io. We'll be in touch within one business day.

Engineering Blog

Ideas from the ZettaBrain Team

Deep dives into private AI, multi-tenant RAG, and building production-grade document intelligence — on-prem.

RAG Multi-Tenant BM25

Per-Tenant BM25 in Hybrid RAG: How We Fixed Cross-Boundary Document Leakage

A shared BM25 index poisons IDF scores across tenant boundaries even when your vector store is correctly isolated. Here's how we caught it and fixed it with per-team BM25 indices.

Olajide Shobowale · May 2026 Read →

← Back to Blog

Per-Tenant BM25 in Hybrid RAG: How We Fixed Cross-Boundary Document Leakage

Olajide Shobowale May 2026 RAG Multi-Tenant BM25

When building a multi-tenant RAG system, isolating vector stores per tenant is necessary but not sufficient. Hybrid pipelines that combine dense vector search with sparse BM25 keyword search introduce a second isolation requirement that is easy to miss when self-hosting: per-tenant BM25 index isolation.

This is a lessons-learned account of how we hit this problem in production with ZettaBrain Teams — an open-source, self-hosted multi-tenant RAG server — and how we fixed it. Managed RAG platforms handle this at the platform level; if you are self-hosting, you own this problem yourself.

How We Found It

We were confident the document boundaries were solid. Each team gets its own ChromaDB collection in a separate directory. Vector search is cleanly scoped. We tested it, demos worked, everything looked good.

Then we added hybrid retrieval — BM25 (keyword search) alongside dense vector retrieval, reranked with FlashRank. The results were noticeably better for technical queries.

But when a user on the Finance team searched for "cardiac monitoring protocol", they got back a document from the Health team's corpus. The boundary had leaked.

Why This Happens: BM25 Is a Corpus-Level Algorithm

BM25 is not just a keyword counter. It is a probabilistic ranking function whose scores depend on two corpus-level statistics:

IDF (Inverse Document Frequency) — how rare is this term across the entire corpus? Rare terms score higher.
Average document length — used to normalise term frequency by document verbosity.

Both statistics are computed over all documents in the index at build time. If you build a single BM25 index over every tenant's documents, IDF is contaminated: a medical term that appears rarely in Finance documents but frequently in Health documents will receive an artificially inflated IDF score when retrieved by a Finance query — because the index doesn't know that most of those occurrences "belong to someone else."

This is not a retrieval bug. It is a statistical property of BM25 itself. The only correct fix is a separate BM25 index per tenant.

What We Were Doing Wrong

Our initial implementation stored a single bm25_index.pkl in the shared ChromaDB directory and loaded it for every query regardless of which team was asking:

BM25_PATH = CHROMA_DIR / "bm25_index.pkl"   # shared — wrong

def rebuild_bm25_index(vectorstore):
    docs = vectorstore.get()["documents"]     # all tenants' docs
    tokenised = [d.lower().split() for d in docs]
    BM25_PATH.write_bytes(pickle.dumps(BM25Okapi(tokenised)))

When Finance team ingested its payroll documents, the BM25 index was rebuilt over Finance + Health documents combined. When a Finance query arrived, the BM25 scores were computed against this contaminated corpus.

Self-Hosted vs. Managed RAG

Managed RAG platforms handle tenant isolation as a platform-level concern — if you use one of those services, BM25 scoping is likely taken care of for you.

If you are self-hosting — running LangChain, LlamaIndex, RAGFlow, or your own stack on your own infrastructure — you own this problem. The default BM25Retriever.from_documents() call in every major framework operates on whatever corpus you hand it. No framework will stop you from handing it all tenants' documents at once. That decision, and its consequences, are yours.

The Fix: Per-Team BM25 Indices

The solution is straightforward once you see it: store and build one BM25 index per team, in that team's own ChromaDB directory, using only that team's documents.

def _team_bm25_path(team_slug: str) -> Path:
    return CHROMA_DIR / team_slug / "bm25_index.pkl"

def _rebuild_team_bm25(vectorstore, team_slug: str):
    result = vectorstore.get()
    docs = result.get("documents", [])
    metas = result.get("metadatas", [])
    if not docs:
        _team_bm25_path(team_slug).unlink(missing_ok=True)
        return
    tokenised = [d.lower().split() for d in docs]
    index = BM25Okapi(tokenised)
    data = {"index": index, "docs": docs, "metas": metas}
    _team_bm25_path(team_slug).write_bytes(pickle.dumps(data))

def _bm25_search_team(query: str, team_slug: str, k: int = 12):
    p = _team_bm25_path(team_slug)
    if not p.exists():
        return []
    data = pickle.loads(p.read_bytes())
    index, docs, metas = data["index"], data["docs"], data["metas"]
    tokens = query.lower().split()
    scores = index.get_scores(tokens)
    top_k = sorted(range(len(scores)), key=lambda i: scores[i], reverse=True)[:k]
    return [(docs[i], metas[i], float(scores[i])) for i in top_k if scores[i] > 0]

Each team's BM25 index is built exclusively from that team's documents, so IDF scores reflect only that team's corpus. The statistical boundary is now correct.

Hybrid Retrieval with MD5 Deduplication

The full hybrid retrieval function merges dense (MMR) and sparse (BM25) results, deduplicates by content hash, and reranks with FlashRank:

def _hybrid_retrieve(question, vectorstore, team_slug, top_k=5):
    # Dense: MMR over team's vector store
    mmr_docs = vectorstore.max_marginal_relevance_search(
        question, k=6, fetch_k=30, lambda_mult=0.82
    )
    # Sparse: per-team BM25
    bm25_hits = _bm25_search_team(question, team_slug, k=10)

    # Merge and deduplicate by MD5 hash of content
    seen, candidates = set(), []
    for doc in mmr_docs:
        h = hashlib.md5(doc.page_content.encode()).hexdigest()
        if h not in seen:
            seen.add(h)
            candidates.append(PassageObject(text=doc.page_content))
    for text, meta, _ in bm25_hits:
        h = hashlib.md5(text.encode()).hexdigest()
        if h not in seen:
            seen.add(h)
            candidates.append(PassageObject(text=text))

    # Rerank
    ranked = flashrank_client.rerank(
        RerankRequest(query=question, passages=candidates)
    )
    return [r["text"] for r in ranked[:top_k]]

The Boundary Test

To verify isolation, we wrote a test that ingests two teams with completely separate documents and confirms each team only retrieves its own content:

def test_per_tenant_bm25_isolation():
    # Setup: Finance team has payroll docs, Health team has clinical protocols
    finance_vs = load_team_vectorstore("finance")
    health_vs  = load_team_vectorstore("health")
    _rebuild_team_bm25(finance_vs, "finance")
    _rebuild_team_bm25(health_vs,  "health")

    # Finance query — must not surface Health documents
    finance_hits = _bm25_search_team("payroll processing schedule", "finance", k=5)
    health_hits  = _bm25_search_team("payroll processing schedule", "health",  k=5)

    finance_texts = [h[0] for h in finance_hits]
    health_texts  = [h[0] for h in health_hits]

    # No overlap between the two result sets
    assert not set(finance_texts) & set(health_texts), \
        "BM25 isolation breach: shared content found across tenant indices"

def test_shared_index_would_fail():
    # Demonstrate contamination with a single combined index
    all_docs = finance_docs + health_docs
    shared_bm25 = BM25Okapi([d.lower().split() for d in all_docs])
    scores = shared_bm25.get_scores("cardiac monitoring protocol".lower().split())
    top_idx = sorted(range(len(scores)), key=lambda i: scores[i], reverse=True)[:3]
    # Top results include Health docs even when "queried" in Finance context
    top_texts = [all_docs[i] for i in top_idx]
    health_contamination = any(
        "cardiac" in t or "monitoring" in t for t in top_texts[:3]
    )
    assert health_contamination, \
        "Expected shared index to surface Health docs — proves isolation is needed"

The ASCII Picture

WRONG — single shared index
┌──────────────────────────────────────────┐
│  BM25 index (all tenants)                │
│  finance_docs + health_docs combined     │
│  IDF contaminated                        │
└──────────────────┬───────────────────────┘
                   │ query from Finance user
                   ▼ returns Health documents  ✗

RIGHT — per-team indices
┌──────────────────────┐  ┌──────────────────────┐
│ BM25: finance/       │  │ BM25: health/        │
│ bm25_index.pkl       │  │ bm25_index.pkl       │
│ (finance docs only)  │  │ (health docs only)   │
└──────────┬───────────┘  └──────────────────────┘
           │ Finance query
           ▼ only Finance results  ✓

Lessons

Vector store isolation is not enough for hybrid RAG. Every retrieval component — dense, sparse, reranker — must be scoped to the tenant.
BM25 contamination is silent. Queries return plausible results; you only catch the leak with adversarial cross-team tests.
Storage cost is trivial. Each team's BM25 pickle is kilobytes. There is no engineering reason to share the index.
Rebuild on every ingest. Call _rebuild_team_bm25 at the end of every ingestion run so the index stays in sync with the vector store.

Try ZettaBrain Teams

Multi-tenant, private-by-design document AI — self-hosted on your own infrastructure. No data leaves your environment.

View on GitHub →

Questions? Email us at hello@zettabrain.io

Turn Your File Storage into aPrivate AI Knowledge Base

Private AI for Individualsand Organisations

ZettaBrain RAG

ZettaBrain Teams

How ZettaBrain AnswersYour Questions

Built for Teams That Can'tSend Files to the Cloud

Legal & Compliance

Healthcare

Financial Services

Government & Defense

Research & Education

Enterprise IT & Operations

Private by Design,Practical by Default

No Cloud Dependency

Works With Storage You Already Have

Answers Come With Sources

Straightforward to Install

Flexible Hardware Requirements

Runs Where YourInfrastructure Lives

Ubuntu

Red Hat Linux

macOS

Windows

GPU Acceleration

The People Behind ZettaBrain

Supported by Industry Leaders

Ready to Get Started?

Private Document AI

What It Does

Supported Platforms

Quick Links

Quick Install

What the installer does

Install via pip / pipx

After install

First-Time Setup

1. Run the Wizard

2. Launch the Web Interface

3. Or Use the CLI

Retrieval Pipeline

Supported Document Formats

CLI Commands

In-Session Commands

GPU & Model Selection

Supported Hardware

Model Selection Wizard

Approximate Performance

Configuration

System Requirements

Storage Sources

Diagnostics

Uninstall

Linux / macOS

Windows

Sample Test Data

Download the Document Sets

Test Prompts Guide

How to Use

1. Download and unzip

2. Ingest one industry into ZettaBrain

3. Ask the test prompts

4. Switch industries

What to Look For

Use Your Own Documents

Multi-Tenant Private Document AI

What It Does

Key Features

Quick Links

Quick Install

What the installer does

After install

First-Time Setup

1. Run the Setup Wizard

2. Start the Server

3. First Login & Admin Setup

Teams & Users

Creating a Team

User Roles

Document Isolation

Model Governance

Turn Your File Storage into a
Private AI Knowledge Base

Private AI for Individuals
and Organisations

How ZettaBrain Answers
Your Questions

Built for Teams That Can't
Send Files to the Cloud

Private by Design,
Practical by Default

Runs Where Your
Infrastructure Lives

Request a
Briefing or Call