21 november 2025

Kotaemon: Open-Source RAG with UI Complete Guide to Features, Setup, Usage, and Customization

Executive Summary

Kotaemon is an open-source, Gradio-powered Retrieval-Augmented Generation (RAG) UI developed by Cinnamon, designed for seamless chatting with documents. Launched in 2024, it has amassed 24.6k GitHub stars and 2k forks as of November 2025, with active development up to v0.11.0 (July 2025) GitHub Repo. It bridges end-users (simple document QA) and developers (custom RAG pipelines), supporting local/private LLMs (Ollama, LlamaCPP), cloud APIs (OpenAI, Azure, Groq), hybrid retrieval (vector + full-text + reranking), multimodal parsing (figures/tables via Docling/Unstructured), GraphRAG/LightRAG, agents (ReAct, ReWOO), and multi-user collaboration.

This comprehensive 2025 guide (exceeding 10,000 words) exhaustively covers all possibilities:

Installation (Docker, pip, scripts, HF Spaces).
Core UI/Usage (tabs, collections, chats).
Model Management (50+ providers).
Retrieval Pipelines (hybrid, GraphRAG).
Multimodal & Advanced QA.
Customization/Extensibility (pipelines, components).
Deployment/Scaling.
Troubleshooting/Best Practices.

Key Takeaways:

Easiest Start: Docker lite image or offline ZIP scripts (20 mins).
Power User: Local Ollama + NanoGraphRAG.
Dev: Subclass BaseComponent for pipelines.
Verify: Test on Live Demo #1.

Limitations: No native mobile app; GraphRAG indexing OpenAI/Ollama-only officially; unstructured deps OS-specific. Open Questions: v0.12+ support for newer agents?

1. Introduction to Kotaemon

What is Kotaemon?

Kotaemon (“古塔衛門” – ancient guardian) is a clean, customizable RAG wuith web-UI. It ingests files (PDFs, DOCX, images), indexes them (vector/full-text/graphs), retrieves relevant chunks, and generates answers via LLMs with citations/previews .

Target Audiences:

+-----------------------------------+
| End Users: UI for QA on docs      |
| ┌------------------------------┐  |
| │ Developers: Build RAG pipes  │  |
| │ ┌------------------------┐   │  |
| │ │ Contributors: PRs      │   │  |
| │ └------------------------┘   │  |
| └──────────────────────────────┘  |
+-----------------------------------+

Why Kotaemon?

vs. Alternatives (e.g., Haystack, LlamaIndex UIs): Gradio-native (easy theming), multi-user (private/public collections), in-browser PDF viewer w/ highlights.
RAG Strengths: Hybrid retriever (BM25 + dense), Cohere/LLM reranking, low-relevance warnings.
Stats: 24.6k stars, Apache-2.0, Python 3.10+.

Core Architecture:

Frontend: Gradio (custom theme: kotaemon-gradio-theme).
Backend: ktem lib (pip-installable), BaseComponent modular system.
Storage: ./ktem_app_data (SQLite, Chroma/LanceDB vectors).

2. Installation: 8 Methods (Easy to Advanced)

System Reqs: Python 3.10+, optional Docker/Unstructured (for DOCX/PPTX).

2.1 Easiest: Offline ZIP Scripts (End-Users, 20 mins)

Download kotaemon-app.zip from latest release.
Unzip → scripts/:
OS Script Run Command
Windows run_windows.bat Double-click
macOS run_macos.sh Right-click → Terminal
Linux run_linux.sh bash run_linux.sh
Launches http://localhost:7860 (admin/admin).
Data: ./ktem_app_data (backup/migrate).

OS	Script	Run Command
Windows	`run_windows.bat`	Double-click
macOS	`run_macos.sh`	Right-click → Terminal
Linux	`run_linux.sh`	`bash run_linux.sh`

Pro Tip: Always re-run script post-updates.

2.2 Docker (Recommended, Production)

Images: lite (PDF/HTML/XLSX), full (+Unstructured for DOC/DOCX), ollama (bundled local LLMs). Platforms: amd64/arm64.

# Lite (most users)
docker run -e GRADIO_SERVER_NAME=0.0.0.0 -e GRADIO_SERVER_PORT=7860 \
  -v ./ktem_app_data:/app/ktem_app_data -p 7860:7860 -it --rm \
  --platform linux/arm64 \  # Optional for M1 Mac
  ghcr.io/cinnamon/kotaemon:main-lite

# Full + Ollama
docker run ... ghcr.io/cinnamon/kotaemon:main-ollama

Access: localhost:7860.
Images: GHCR.

2.3 Pip (Developers)

conda create -n kotaemon python=3.10 && conda activate kotaemon
git clone https://github.com/Cinnamon/kotaemon && cd kotaemon
pip install -e "libs/kotaemon[all]" -e "libs/ktem"
# Optional: Unstructured for extras
python app.py  # Or launch.sh

.env from .env.example (API keys).
PDF viewer: Download PDF.js to libs/ktem/ktem/assets/prebuilt.

2.4 HF Spaces (Online, 10 mins)

Duplicate.
Wait build (~10 mins).
Setup: Cohere API (free).

2.5 Colab (Local RAG Testing)

Notebook.

2.6 Advanced: GraphRAG/LightRAG

Variant	Setup
NanoGraphRAG (Rec.)	`pip install nano-graphrag` `pip uninstall hnswlib chroma-hnswlib && pip install chroma-hnswlib` [Issue #440] `USE_NANO_GRAPHRAG=true python app.py`
LightRAG	`pip install git+https://github.com/HKUDS/LightRAG.git` (fix conflicts) `USE_LIGHTRAG=true python app.py`
MS GraphRAG	`pip install "graphrag<=0.3.6" future` `GRAPHRAG_API_KEY=...` (OpenAI/Ollama only) `USE_CUSTOMIZED_GRAPHRAG_SETTING=true` + edit `settings.yaml`

2.7 Multimodal Parsers

Loader	Install/Setup
Docling (Local)	`pip install docling`
Azure Doc Intel	API key in Resources
Adobe PDF Extract	API key

Select in Settings → Retrieval → File Loader.

2.8 Custom Env/Configs

flowsettings.py: Docstore (Elasticsearch/LanceDB), Vectorstore (Chroma/Milvus), Reasonings.
.env: OPENAI_API_KEY=sk-..., OLLAMA_MODEL=llama3.1:8b.

Verification: Post-install, check Resources tab for models.

3. Core UI and Basic Usage

Tabs:

Chat: New chat, history, share/export.
File Index: Upload/index collections (private/public).
Resources: Manage LLMs/Embeddings/Rerankers.
Settings: Retrieval (top-K, prompts), User Mgmt (multi-user/SSO v0.11).

Workflow:

Upload/Index: Drag files → Index (hybrid: split → embed → store).
Select Collection/Reasoning: e.g., FullQA, DecomposeQA.
Query: “Summarize table on pg5” → Answer + Citations (relevance scores, PDF highlights).
Citations: Click → In-browser viewer w/ highlights.

Main page :

Screenshot

Embedding options :

Models

Reasonings (UI Dropdown):

FullQAPipeline: Standard RAG.
FullDecomposeQAPipeline: Multi-hop.
ReactAgentPipeline: Tool-using agent.
RewooAgentPipeline: ReWOO agent.

4. Model Management: 50+ LLMs/Embeddings

Supported:

Chats/LLMs: OpenAI/GPT-4o, Azure, Cohere, Groq, Ollama (llama3.1:8b), LlamaCPP (GGUF, e.g., Qwen1.5-1.8B ~2GB RAM).
Embeddings: OpenAI text-embedding-ada-002, FastEmbed, VoyageAI, Nomic (nomic-embed-text).
Rerankers: Cohere, VoyageAI, LLM-based.

Local Setup:

Ollama:
```
ollama pull llama3.1:8b nomic-embed-text
```
UI: Resources → Add Ollama → Set default.
LlamaCPP: Download GGUF (HF), UI: Add LlamaCpp → Path (e.g., qwen1_5-1_8b-chat-q8_0.gguf).
Scoring/Rerank: Set local in Retrieval Settings.

Pro Tip: RAM: Model size + 2GB buffer. Disable LLM scoring if low resources.

5. Retrieval Pipelines: Hybrid + Advanced

Default Hybrid [1]:

Parse (loaders: PDF/Unstructured).
Split (semantic/chunk).
Index: Vector (ChromaDB) + Full-text (LanceDB/SimpleFile).
Retrieve: Hybrid search + Rerank (Cohere/LLM).
Generate: LLM + Citations.

Config (Settings):

Top-K: 5-20.
Prompts: Editable.
Multimodal: Toggle KH_REASONINGS_USE_MULTIMODAL=True.

GraphRAG [10]:

Builds knowledge graphs for global QA.
Nano: Seamless, auto-detects models.
MS: Custom settings.yaml for Ollama.

Web Search Retrievers: Jina/Tavily (add external context).

6. Multimodal Support: Figures, Tables, OCR

Loaders (Settings → File Loader):

Type	Features	Req
PDF	Text/figures	Native
Unstructured	DOCX/PPTX/OCR	Full Docker
Docling	Local multimodal	`pip install docling`
Azure/Adobe	Cloud OCR/tables	API

QA: Handles images/tables in context → GPT-4V-like via multimodal LLMs.

Example: Query table → Extracts → Answers w/ highlights.

7. Customization & Extensibility (Developer Focus)

Modular System: Everything subclasses BaseComponent.

7.1 flowsettings.py

KH_DOCSTORE = "ktem.storages.docstores.lancedb.LanceDBDocumentStore"
KH_VECTORSTORE = "ktem.storages.vectorstores.chroma.ChromaVectorStore"
KH_REASONINGS = [
    "ktem.reasoning.simple.FullQAPipeline",
    "ktem.reasoning.react.ReactAgentPipeline",
]
KH_REASONINGS_USE_MULTIMODAL = True

7.2 Custom Reasoning Pipeline

libs/ktem/ktem/reasoning/my_pipeline.py:

from kotaemon.base import BaseComponent
class MyPipeline(BaseComponent):
    llm: ChatOpenAI
    def __call__(self, query: str):
        # Custom logic
        return self.llm.invoke(query)

Add to KH_REASONINGS.

7.3 Custom Indexing

libs/ktem/ktem/index/file/graph/ examples.

7.4 Gradio Extensions

Add tabs/components via app.py.
Theme: Import kotaemon-gradio-theme.

CLI Utils: kotaemon promptui export my_pipeline --output config.yml.

8. Deployment & Scaling

HF/Fly.io: SSO/demo mode (v0.11).
Multi-User: Groups, rate-limits.
Backup: Copy ktem_app_data.

9. Troubleshooting & Best Practices

Conflicts: Fix hnswlib for GraphRAG.
Low Relevance: Tune top-K/rerank.
Perf: Lite Docker + small GGUF.