Talk = GPT-2 + Whisper + WASM

I just had an awesome idea:

Make a web-page that:
- Listens when someone speaks
- Transcribes the words using [WASM Whisper](https://github.com/ggerganov/whisper.cpp/tree/master/examples/whisper.wasm)
- Generates a new sentence using [WASM GPT-2](https://github.com/ggerganov/ggml/tree/master/examples/gpt-2)
- Uses [Web Speech API](https://developer.mozilla.org/en-US/docs/Web/API/Web_Speech_API) to synthesise the speech and play it on the speakers.

**All of this running locally in the browser - no server required**

I have all the ingredients and I think the performance is just enough. I just have to put it together.
The total data that the page will have to load on startup (probably using Fetch API) is:
- 74 MB for the Whisper `tiny.en` model
- 240 MB for the GPT-2 `small` model
- Web Speech API is built-in in modern browsers

I think it will be very fun because you could talk to the web-page or even add extra devices that talk to each other only through the mic and the speakers. For example, you simply open the page on your phone and tablet and put them next to each other - listen to them talk about something 😄 

Any ideas to make this even more fun?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Talk = GPT-2 + Whisper + WASM #154

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Talk = GPT-2 + Whisper + WASM #154

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions