Skip to content

High-performance Q&A system combining Gemini AI with Retrieval-Augmented Generation (RAG) and intelligent context caching. Optimizes API costs through batch processing and smart document retrieval. Features parallel processing, efficient text chunking, and context reuse. Handling multiple queries against large document collections at scale

Notifications You must be signed in to change notification settings

Shifat7/Optimized-Gemini-Context-Caching-RAG

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

15 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Optimized-Gemini-Context-Caching-RAG

This is an offline-only local demo of batch prediction + context caching pipeline from the GSoC DeepMind Challenge Q4:

This repo demonstrates batch prediction with Gemini APIs, leveraging long context and context caching for efficiently answering questions about a single video. It addresses a common use case of extracting information from large content sources.

Scenario: Extracting information from a video lecture/documentary by asking multiple, potentially interconnected, questions.

Features:

  • Batch Prediction: Design and optimization for submitting a batch of questions. This should minimize API calls and improve efficiency. Consider using techniques like dividing the questions into smaller batches to avoid exceeding API limits. 📦
  • Long Context Handling: Demonstrate use of Gemini's long context capabilities. Show how to provide the entire video transcript (or relevant segments) as context. Consider strategies for handling transcripts that exceed the maximum context length. 📏
  • Context Caching: Implement context caching to store and reuse previous interactions. This can significantly reduce the amount of data sent to the API and improve response times, especially for interconnected questions. Use a suitable caching mechanism (e.g., in-memory cache, persistent storage). 💾
  • Interconnected Questions: Handle questions that build upon previous answers. The code should maintain the conversation history and use it to provide more accurate and relevant responses. 🔗
  • Output Formatting: Clear and user-friendly output. Present the answers in a structured format, possibly with links to the relevant timestamps in the video. ✨
  • Code Documentation: Detailed comments, setup instructions, and usage guidelines. Explain the different components of the code and how they work together. Include instructions on how to obtain and configure an API key. Provide example questions and expected outputs. 📖
  • Error Handling: Implement robust error handling to gracefully handle API errors, network issues, and invalid inputs.

A work in progress code implementation from the submitted proposal paper

How to Run

poetry install
poetry run python src

About

High-performance Q&A system combining Gemini AI with Retrieval-Augmented Generation (RAG) and intelligent context caching. Optimizes API costs through batch processing and smart document retrieval. Features parallel processing, efficient text chunking, and context reuse. Handling multiple queries against large document collections at scale

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages