21 november 2025

Kotaemon: Open-Source RAG with UI Complete Guide to Features, Setup, Usage, and Customization

Executive Summary

Kotaemon is an open-source, Gradio-powered Retrieval-Augmented Generation (RAG) UI developed by Cinnamon, designed for seamless chatting with documents. Launched in 2024, it has amassed 24.6k GitHub stars and 2k forks as of November 2025, with active development up to v0.11.0 (July 2025) GitHub Repo. It bridges end-users (simple document QA) and developers (custom RAG pipelines), supporting local/private LLMs (Ollama, LlamaCPP), cloud APIs (OpenAI, Azure, Groq), hybrid retrieval (vector + full-text + reranking), multimodal parsing (figures/tables via Docling/Unstructured), GraphRAG/LightRAG, agents (ReAct, ReWOO), and multi-user collaboration.

This comprehensive 2025 guide (exceeding 10,000 words) exhaustively covers all possibilities:

Key Takeaways:

Limitations: No native mobile app; GraphRAG indexing OpenAI/Ollama-only officially; unstructured deps OS-specific. Open Questions: v0.12+ support for newer agents?

1. Introduction to Kotaemon

What is Kotaemon?

Kotaemon (“古塔衛門” – ancient guardian) is a clean, customizable RAG wuith web-UI. It ingests files (PDFs, DOCX, images), indexes them (vector/full-text/graphs), retrieves relevant chunks, and generates answers via LLMs with citations/previews .

Target Audiences:

+-----------------------------------+
| End Users: UI for QA on docs      |
| ┌------------------------------┐  |
| │ Developers: Build RAG pipes  │  |
| │ ┌------------------------┐   │  |
| │ │ Contributors: PRs      │   │  |
| │ └------------------------┘   │  |
| └──────────────────────────────┘  |
+-----------------------------------+

Why Kotaemon?

Core Architecture:

2. Installation: 8 Methods (Easy to Advanced)

System Reqs: Python 3.10+, optional Docker/Unstructured (for DOCX/PPTX).

2.1 Easiest: Offline ZIP Scripts (End-Users, 20 mins)

  1. Download kotaemon-app.zip from latest release.
  2. Unzip → scripts/:
    OSScriptRun Command
    Windowsrun_windows.batDouble-click
    macOSrun_macos.shRight-click → Terminal
    Linuxrun_linux.shbash run_linux.sh
  3. Launches http://localhost:7860 (admin/admin).
  4. Data: ./ktem_app_data (backup/migrate).

Pro Tip: Always re-run script post-updates.

Images: lite (PDF/HTML/XLSX), full (+Unstructured for DOC/DOCX), ollama (bundled local LLMs). Platforms: amd64/arm64.

# Lite (most users)
docker run -e GRADIO_SERVER_NAME=0.0.0.0 -e GRADIO_SERVER_PORT=7860 \
  -v ./ktem_app_data:/app/ktem_app_data -p 7860:7860 -it --rm \
  --platform linux/arm64 \  # Optional for M1 Mac
  ghcr.io/cinnamon/kotaemon:main-lite

# Full + Ollama
docker run ... ghcr.io/cinnamon/kotaemon:main-ollama

2.3 Pip (Developers)

conda create -n kotaemon python=3.10 && conda activate kotaemon
git clone https://github.com/Cinnamon/kotaemon && cd kotaemon
pip install -e "libs/kotaemon[all]" -e "libs/ktem"
# Optional: Unstructured for extras
python app.py  # Or launch.sh

2.4 HF Spaces (Online, 10 mins)

  1. Duplicate.
  2. Wait build (~10 mins).
  3. Setup: Cohere API (free).

2.5 Colab (Local RAG Testing)

Notebook.

2.6 Advanced: GraphRAG/LightRAG

VariantSetup
NanoGraphRAG (Rec.)pip install nano-graphrag
pip uninstall hnswlib chroma-hnswlib && pip install chroma-hnswlib [Issue #440]
USE_NANO_GRAPHRAG=true python app.py
LightRAGpip install git+https://github.com/HKUDS/LightRAG.git (fix conflicts)
USE_LIGHTRAG=true python app.py
MS GraphRAGpip install "graphrag<=0.3.6" future
GRAPHRAG_API_KEY=... (OpenAI/Ollama only)
USE_CUSTOMIZED_GRAPHRAG_SETTING=true + edit settings.yaml

2.7 Multimodal Parsers

LoaderInstall/Setup
Docling (Local)pip install docling
Azure Doc IntelAPI key in Resources
Adobe PDF ExtractAPI key

Select in Settings → Retrieval → File Loader.

2.8 Custom Env/Configs

Verification: Post-install, check Resources tab for models.

3. Core UI and Basic Usage

Tabs:

  1. Chat: New chat, history, share/export.
  2. File Index: Upload/index collections (private/public).
  3. Resources: Manage LLMs/Embeddings/Rerankers.
  4. Settings: Retrieval (top-K, prompts), User Mgmt (multi-user/SSO v0.11).

Workflow:

  1. Upload/Index: Drag files → Index (hybrid: split → embed → store).
  2. Select Collection/Reasoning: e.g., FullQA, DecomposeQA.
  3. Query: “Summarize table on pg5” → Answer + Citations (relevance scores, PDF highlights).
  4. Citations: Click → In-browser viewer w/ highlights.

Main page :

Screenshot

Embedding options :

Models

Reasonings (UI Dropdown):

4. Model Management: 50+ LLMs/Embeddings

Supported:

Local Setup:

  1. Ollama:
    ollama pull llama3.1:8b nomic-embed-text
    
    UI: Resources → Add Ollama → Set default.
  2. LlamaCPP: Download GGUF (HF), UI: Add LlamaCpp → Path (e.g., qwen1_5-1_8b-chat-q8_0.gguf).
  3. Scoring/Rerank: Set local in Retrieval Settings.

Pro Tip: RAM: Model size + 2GB buffer. Disable LLM scoring if low resources.

5. Retrieval Pipelines: Hybrid + Advanced

Default Hybrid [1]:

  1. Parse (loaders: PDF/Unstructured).
  2. Split (semantic/chunk).
  3. Index: Vector (ChromaDB) + Full-text (LanceDB/SimpleFile).
  4. Retrieve: Hybrid search + Rerank (Cohere/LLM).
  5. Generate: LLM + Citations.

Config (Settings):

GraphRAG [10]:

Web Search Retrievers: Jina/Tavily (add external context).

6. Multimodal Support: Figures, Tables, OCR

Loaders (Settings → File Loader):

TypeFeaturesReq
PDFText/figuresNative
UnstructuredDOCX/PPTX/OCRFull Docker
DoclingLocal multimodalpip install docling
Azure/AdobeCloud OCR/tablesAPI

QA: Handles images/tables in context → GPT-4V-like via multimodal LLMs.

Example: Query table → Extracts → Answers w/ highlights.

7. Customization & Extensibility (Developer Focus)

Modular System: Everything subclasses BaseComponent.

7.1 flowsettings.py

KH_DOCSTORE = "ktem.storages.docstores.lancedb.LanceDBDocumentStore"
KH_VECTORSTORE = "ktem.storages.vectorstores.chroma.ChromaVectorStore"
KH_REASONINGS = [
    "ktem.reasoning.simple.FullQAPipeline",
    "ktem.reasoning.react.ReactAgentPipeline",
]
KH_REASONINGS_USE_MULTIMODAL = True

7.2 Custom Reasoning Pipeline

  1. libs/ktem/ktem/reasoning/my_pipeline.py:
from kotaemon.base import BaseComponent
class MyPipeline(BaseComponent):
    llm: ChatOpenAI
    def __call__(self, query: str):
        # Custom logic
        return self.llm.invoke(query)
  1. Add to KH_REASONINGS.

7.3 Custom Indexing

libs/ktem/ktem/index/file/graph/ examples.

7.4 Gradio Extensions

CLI Utils: kotaemon promptui export my_pipeline --output config.yml.

8. Deployment & Scaling

9. Troubleshooting & Best Practices