Skip to content

Project: MULTIGPT - Multi-Model RAG with document, image and Audio integration, Python, JavaScript/TypeScript #141

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
3 of 8 tasks
Mokshu3242 opened this issue Apr 19, 2025 · 0 comments

Comments

@Mokshu3242
Copy link

Mokshu3242 commented Apr 19, 2025

Project Name

MULTIGPT

Description

MultiGPT - AI Agent with Multi-Modal Capabilities 🚀

Python
FastAPI
Cloudflare
MultiModal

An advanced AI agent capable of processing text, audio, images, and documents with visualization support. Built with FastAPI, Cloudflare AI, ElevenLabs, and LangChain.

🌟 Features

1. Core Capabilities

  • Conversational AI with persistent chat history
  • Multi-language support (English, Hindi, Marathi)
  • JWT Authentication + Self-Hosted OTP verification
  • Document, Image and Audio Processing
  • Rate-limited API endpoints

2. Input Processing

Type Endpoint Technologies Used
Text /chat Cloudflare LLM
Voice /voice ElevenLabs TTS + Whisper
Audio /audio Whisper transcription
Images /handle_image CLIP image analysis
Documents /upload_doc PyPDFium2, docx2txt, msoffcrypto

3. Advanced Functions

  • YouTube transcript extraction
  • Data visualization (bar/line/pie charts)
  • Auto-expiring file storage (2-day TTL)

🛠️ Tech Stack

  • Frontend: React Js
  • Backend: FastAPI
  • AI Services:
    • Cloudflare (LLaMA-2, Whisper, CLIP)
    • ElevenLabs (Text-to-Speech)
  • Database: MongoDB
  • Data Processing:
    • LangChain (Document chunking)
    • Pandas/Plotly (Visualizations)

Language & Framework

  • Python
  • C#
  • Java
  • JavaScript/TypeScript
  • Microsoft Copilot Studio
  • Microsoft 365 Agents SDK
  • Azure AI Agent Service

Project Repository URL

https://github.com/Mokshu3242/MultiModal_Frontend, https://github.com/Mokshu3242/Multimodal_Backend

Deployed Endpoint URL

https://multi-modal-frontend.vercel.app/

Project Video

https://youtu.be/XLyAPlee8Og

Team Members

Mokshu3242, bhavya681, Sudeep10

Registration Check

  • Each of my team members has filled out the registration form
@multispark multispark changed the title MULTIGPT: Multi-Model RAG with document, image and Audio integration Project: MULTIGPT - Multi-Model RAG with document, image and Audio integration, Python, JavaScript/TypeScript Apr 25, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants