Thanks to asanchezyali for the initial project and research that inspired this work.
A fun, experimental project that puts a face to all the generative AI models we use today. This repository showcases an interactive chat experience with a 3D avatar, using various modern web and AI technologies.
Demo Video:
- Framework: Next.js 15 for client and server components.
- Rendering & 3D:
- Three.js combined with React Three Fiber and Drei.
- gltfjsx to convert GLB models into React components with type safety.
- AI & APIs:
- Vercel AI SDK integrated for AI-based functionalities.
- OpenAI for text generation.
- ElevenLabs SDK for Text-to-Speech (TTS).
- Rhubarb Lip Sync (command-line tool) to generate phonemes, mapped to the 3D avatar’s lip movements.
- Data Handling: SWR for data fetching/mutations.
- UI & Styles: Tailwind CSS and shadcn UI.
- Utilities: Leva for real-time debugging and control of 3D object transforms.
- Tooling: Yarn 4 (via Corepack) and TurboPack for faster builds.
- Type Safety: Fully typed with TypeScript—both on the server and client components.
Note: This project is a minimalistic, fun showcase, not a production-ready solution.
- 3D Avatar Chat UI: Interact with a generative AI model through a 3D avatar’s face and lip movements.
- Text-to-Speech: Converts AI responses into audio, giving a voice to your AI avatar (via ElevenLabs).
- Lip Sync: Uses Rhubarb to analyze text and drive the avatar’s lip movement.
- Real-Time Debugging: Leva panel for controlling and tuning the avatar’s expressions and movements.
- Enable Chat Streaming
Add real-time streaming for chat responses to enhance the conversational experience. - Speech-to-Text
Implement audio input so users can talk to the AI rather than typing. - Real-Time API
Integrate OpenAI’s real-time APIs (if/when available) for more dynamic exchanges.
Feel free to explore and contribute to these next steps!
- Node.js v20 (recommended)
- Yarn 4 (via Corepack)
- Git (for cloning the repository)
- FFmpeg for converting MP3 files to WAV (required for lip-sync processing).
- Rhubarb Lip Sync
- Download the appropriate release from this repository.
- Place the downloaded binary in the
.tools/
directory of this project (create the folder if it doesn't exist).
-
Clone the repo:
git clone https://github.com/your-username/chat-with-ai-avatar.git cd chat-with-ai-avatar
-
Install dependencies:
yarn install
-
Create environment variables:
Create a.env
file in the root of the project and add the following variables:OPENAI_API_KEY=your_openai_api_key ELEVEN_LABS_API_KEY=your_elevenlabs_api_key ELEVEN_LABS_VOICE_ID=your_elevenlabs_voice_id
- You’ll need an OpenAI API key.
- You’ll need an ElevenLabs API key (free tier with limited credits).
- Retrieve a Voice ID from your ElevenLabs dashboard.
-
Run the development server:
yarn dev
-
Open in your browser:
Go to http://localhost:3000.
Note: Because this project leverages AI and TTS services, ensure your
.env
file is correctly set up with valid API keys.
- Interactive 3D: Click and drag on the canvas to rotate the 3D view.
- Chat: Enter text in the chat input to see the AI respond (text output).
- Lip Sync: When TTS is enabled, the avatar’s mouth will move in sync with the spoken text (requires FFmpeg and Rhubarb).
- Controls: Use the Leva debug panel to tweak parameters like head rotation, expression intensity, etc.
Contributions are welcome! If you have ideas for improvements, bug reports, or new feature requests, please open an issue or submit a pull request.
This project is open source under the MIT License.
- asanchezyali for the initial project and research that inspired this work.
- Next.js for the awesome React framework.
- React Three Fiber and Drei for the great 3D abstractions in React.
- gltfjsx for automagical 3D model conversions.
- OpenAI for text generation and any future real-time capabilities.
- ElevenLabs for realistic TTS.
- Rhubarb Lip Sync for lip syncing magic.
- Tailwind CSS and shadcn UI for styling.
Have fun exploring and enhancing the conversational AI 3D avatar experience!