Open source multi-modal RAG for building AI apps over private knowledge.
-
Updated
Jun 4, 2025 - Python
Open source multi-modal RAG for building AI apps over private knowledge.
The code used to train and run inference with the ColVision models, e.g. ColPali, ColQwen2, and ColSmol.
LAYRA unlocks next-generation intelligent systems—powered by vision-driven RAG and multi-step agent orchestration—with no limits, no compromises.
Production-ready Inference, Ingestion and Indexing built in Rust 🦀
Vision-Augmented Retrieval and Generation (VARAG) - Vision first RAG Engine
Vision Document Retrieval (ViDoRe): Benchmark. Evaluation code for the ColPali paper.
High-Performance Engine for Multi-Vector Search
LitePali is a minimal, efficient implementation of ColPali for image retrieval and indexing, optimized for cloud deployment.
A new novel multi-modality (Vision) RAG architecture
REST API for computing cross-modal similarity between images and text using the ColPaLI vision-language model
#3 Winner of Best Use of Zoom API at Stanford TreeHacks 2024! An AI-powered meeting assistant that captures video, audio and textual context from Zoom calls using multimodal RAG.
A multimodal RAG application using Qwen 2.5 VL, ColPali, and QdrantDB for text and image-based retrieval.
The repo provides the code for Qdrant for efficient image indexing and retrieval using models such as ColPali, ColQwen, and VDR-2B-Multi-V1, enhancing multimodal search capabilities across various applications.
ColPali is vision based RAG (Retrieval Augmented Generation) which can capture visual data
OCR and Document Search Web Application
Add a description, image, and links to the colpali topic page so that developers can more easily learn about it.
To associate your repository with the colpali topic, visit your repo's landing page and select "manage topics."