Private Document AI
ZettaBrain lets you have natural language conversations with your own documents — running entirely on your own infrastructure, with no API keys and no data sent to any cloud service.
What It Does
It connects to your existing file storage (local disk, NFS, SMB, or S3-compatible), indexes your documents locally, and lets you query them in plain English through a web interface or CLI. Every answer includes the source chunks it came from.
Supported Platforms
| Platform | Installer | Notes |
|---|---|---|
| Ubuntu | install.sh | systemd service, full GPU support |
| Red Hat Linux | install.sh | DNF / YUM, RHEL 8 / 9 / 10 |
| macOS (Apple Silicon / Intel) | install.sh | Homebrew, Metal GPU |
| Windows 10/11 / Server 2016+ | install.ps1 / install.cmd | PowerShell, winget |
Quick Links
Quick Install
Select your operating system — the installer handles Python, Ollama, and model download automatically.
curl -fsSL https://zettabrain.app/install.sh | sudo bash
curl -fsSL https://zettabrain.app/install.sh | sudo bash
Does not require sudo — Homebrew refuses to run as root
curl -fsSL https://zettabrain.app/install.sh | bash
Option 1 — download install.cmd and run as Administrator
# Download: https://zettabrain.app/install.cmd
# Right-click → Run as administrator
Option 2 — PowerShell (run as Administrator)
[Net.ServicePointManager]::SecurityProtocol = [Net.SecurityProtocolType]::Tls12
irm https://zettabrain.app/install.ps1 | iex
What the installer does
- Detects your OS and package manager (apt / dnf / yum / brew / winget)
- Installs Python 3.9+ and system dependencies including
zstd - Installs
zettabrain-ragvia pipx (isolated environment) - Installs and starts Ollama as a background service
- Downloads the
nomic-embed-textembedding model (~275 MB) - On NVIDIA hardware: installs CUDA runtime and drivers
Install via pip / pipx
pip install zettabrain-rag
# or isolated:
pipx install zettabrain-rag
zettabrain --version
After install
sudo zettabrain-setup # configure storage + model
zettabrain-server # launch web UI at :7860
zettabrain-chat # or use CLI chat
First-Time Setup
After installation, the setup wizard configures storage, selects a model based on your hardware, and enables HTTPS.
1. Run the Wizard
sudo zettabrain-setup
2. Launch the Web Interface
zettabrain-server
Open https://local.zettabrain.app:7860 — trusted HTTPS, fully private.
3. Or Use the CLI
zettabrain-chat
Type any question to query your documents. Type sources to see which chunks were used. Type quit to exit.
Retrieval Pipeline
Five stages process every query — combining keyword and semantic search before passing only the best context to the local LLM.
- Adaptive chunking — Chunk size tuned per document type and text density.
- MMR semantic search — Maximum Marginal Relevance via ChromaDB: relevant and diverse.
- BM25 keyword search — Exact term matching alongside vector search.
- Merge & deduplicate — Both result sets merged; duplicates removed by content hash.
- Cross-encoder re-ranking — FlashRank (
ms-marco-MiniLM-L-12-v2) scores candidates against the actual query.
Supported Document Formats
| Extension | Format | Notes |
|---|---|---|
.pdf | Text-layer PDFs. Scanned PDFs need OCR pre-processing. | |
.txt | Plain Text | UTF-8 encoding. Default chunk size 800. |
.md | Markdown | Headers preserved as chunk boundaries. |
.docx | Word Document | Paragraph structure preserved. Tables extracted as text. |
CLI Commands
| Command | Description |
|---|---|
sudo zettabrain-setup | Storage wizard, model selection, TLS cert |
zettabrain-server | Launch web GUI at port 7860 |
zettabrain-chat | Interactive CLI chat |
zettabrain-chat --rebuild | Rebuild vector store then start |
zettabrain-chat --debug | Show retrieved chunks with each response |
zettabrain-ingest | Ingest documents |
zettabrain-ingest --folder /path | Ingest a specific folder |
zettabrain-ingest --file /path/doc.pdf | Ingest a single file |
zettabrain-ingest --stats | Show vector store contents |
zettabrain-ingest --clear | Wipe the vector store |
zettabrain-status | Version, paths, certs, store stats |
sudo zettabrain-storage add | Add a storage source |
zettabrain-storage list | List configured sources |
In-Session Commands
| Type | Action |
|---|---|
| Any question | Query your documents |
sources | Show which chunks were used |
timing | Show retrieve/generate time per query |
debug on / off | Toggle chunk display |
quit | Exit |
GPU & Model Selection
Ollama auto-detects your GPU. No configuration is needed beyond having the correct drivers installed.
Supported Hardware
- NVIDIA — CUDA 12+
- AMD — ROCm 5.7+
- Apple Silicon — Metal (M1/M2/M3, built-in)
- CPU-only — Works on any x86; smaller models recommended
Model Selection Wizard
Hardware detected: NVIDIA RTX 3080 (10 GB VRAM)
Recommended: llama3.1:8b
1) llama3.2:3b ~2 GB fastest, good for quick Q&A
2) llama3.1:8b ~5 GB balanced ← default
3) mistral:7b ~4 GB strong reasoning
4) llama3.1:13b ~8 GB better, needs 12 GB+
5) qwen2.5:14b ~9 GB excellent, needs 16 GB+
6) Custom
Approximate Performance
| Hardware | Model | Tokens/sec | ~Response time |
|---|---|---|---|
| 4-core CPU, 8 GB RAM | llama3.2:3b | 8–15 | 20–40 s |
| 8-core CPU, 16 GB RAM | llama3.1:8b | 5–12 | 25–60 s |
| NVIDIA RTX 3060 (8 GB) | llama3.1:8b | 60–90 | 3–5 s |
| NVIDIA RTX 3080 (10 GB) | llama3.1:8b | 80–120 | 2–4 s |
| Apple M2 (16 GB) | llama3.1:8b | 30–50 | 6–10 s |
Configuration
Settings via environment variables or /opt/zettabrain/src/zettabrain.env.
| Variable | Default | Description |
|---|---|---|
ZETTABRAIN_DOCS | /opt/zettabrain/data | Documents folder |
ZETTABRAIN_CHROMA | /opt/zettabrain/src/zettabrain_vectorstore | ChromaDB path |
ZETTABRAIN_LLM_MODEL | llama3.1:8b | LLM model name |
ZETTABRAIN_EMBED_MODEL | nomic-embed-text | Embedding model |
ZETTABRAIN_CHUNK_SIZE | 1000 / 800 | Chunk size (adaptive) |
ZETTABRAIN_CHUNK_OVERLAP | 150 / 100 | Chunk overlap (adaptive) |
OLLAMA_HOST | http://localhost:11434 | Ollama API endpoint |
System Requirements
| Minimum | Recommended | |
|---|---|---|
| RAM | 8 GB | 16 GB |
| CPU | 4 cores / 2.5 GHz | 8 cores / 3.0 GHz |
| Disk | 20 GB free | 50 GB free |
| OS | Ubuntu 22.04 / Red Hat 8 / macOS 13 / Windows 10 | Ubuntu 22.04 LTS |
| Python | 3.9 | 3.11+ |
Note on RAM: llama3.1:8b (Q4) needs roughly 5 GB, plus ~2–3 GB for OS and ChromaDB. Below 8 GB you'll hit swap and response times increase significantly.
Storage Sources
- Local disk — Default. Any local path.
- NFS — Network File System mounts.
- SMB — Windows / Samba shares.
- Object Storage — S3-compatible: AWS S3, MinIO, Ceph.
sudo zettabrain-storage add # add a new source
zettabrain-storage list # list configured sources
Diagnostics
zettabrain-status
python3 /opt/zettabrain/src/01_chromadb_setup.py
python3 /opt/zettabrain/src/02_embeddings_test.py
curl http://localhost:11434
ollama list
journalctl -u zettabrain -f
Uninstall
Linux / macOS
pipx uninstall zettabrain-rag
sudo rm -rf /opt/zettabrain
sudo systemctl disable --now zettabrain 2>/dev/null || true
Windows
pipx uninstall zettabrain-rag
Remove-Item -Recurse -Force "$env:LOCALAPPDATA\ZettaBrain"
Sample Test Data
If you don't have your own documents handy, you can use our pre-built test corpora to evaluate ZettaBrain RAG against realistic enterprise content. Each bundle contains ten .docx policy documents from a fictional organization, paired with twenty test prompts you can paste straight into the chat.
Download the Document Sets
| Industry | Organization | Contents | Download |
|---|---|---|---|
| Financial Services | Apex Financial Group (fictional) | 10 policy docs: trading, AML, KYC, benefits, risk, expense, insider trading, IT security, onboarding | financial.zip (~89 KB) |
| Healthcare | Riverside Medical Center (fictional) | 10 policy docs: HIPAA, medication, emergency response, infection control, credentialing, telemedicine | healthcare.zip (~90 KB) |
Test Prompts Guide
Each bundle comes with twenty industry-specific prompts plus cross-document and adversarial prompts (questions ZettaBrain should refuse to answer because the information isn't in your documents).
Download the full Test Prompts Guide →
How to Use
1. Download and unzip
Ubuntu / Debian: install unzip first if needed — sudo apt install -y unzip
mkdir -p ~/zettabrain-samples && cd ~/zettabrain-samples
curl -fsSLO https://www.zettabrain.io/sample-data/zettabrain-financial-test-docs.zip
curl -fsSLO https://www.zettabrain.io/sample-data/zettabrain-healthcare-test-docs.zip
unzip zettabrain-financial-test-docs.zip -d financial
unzip zettabrain-healthcare-test-docs.zip -d healthcare
mkdir $HOME\zettabrain-samples; cd $HOME\zettabrain-samples
irm https://www.zettabrain.io/sample-data/zettabrain-financial-test-docs.zip -OutFile financial.zip
irm https://www.zettabrain.io/sample-data/zettabrain-healthcare-test-docs.zip -OutFile healthcare.zip
Expand-Archive financial.zip -DestinationPath financial
Expand-Archive healthcare.zip -DestinationPath healthcare
2. Ingest one industry into ZettaBrain
zettabrain-ingest --folder ~/zettabrain-samples/financial --rebuild
3. Ask the test prompts
zettabrain-chat
> What is the pre-clearance process for personal securities trades?
> sources # see the exact chunks the answer was drawn from
Or open the web UI at https://local.zettabrain.app:7860 and paste prompts straight in.
4. Switch industries
To test the healthcare set, re-ingest with --rebuild pointed at the other folder. Each --rebuild swaps the active corpus.
zettabrain-ingest --folder ~/zettabrain-samples/healthcare --rebuild
What to Look For
- Accuracy — every answer should be factually grounded in the source
.docxchunks shown bysources. - Grounding — the cited document should match the topic of the question.
- Refusal — adversarial prompts in the guide (e.g. "What is the current stock price of Apex Financial Group?") should produce a clear "not in your documents" response, not a guess.
Use Your Own Documents
When you're ready, drop your own .pdf, .docx, .txt, or .md files into any folder and ingest them the same way — see Document Formats and Storage Sources.
