OpenAI Realtime API compatibility? #245

vvolhejn · 2025-04-03T12:29:57Z

When used as an API server with websockets, FastRTC provides similar functionality to OpenAI's Realtime API.

How about designing the websocket protocol to match OpenAI's, so that FastRTC can be used as a drop-in replacement? This would make adoption very easy for people who are already using the Realtime API.

A similar strategy has worked well for vLLM in the text LLM space: https://github.com/vllm-project/vllm

Perhaps this could work by just taking a StreamHandler/AsyncStreamHandler and running a FastAPI server that would format the messages appropriately. Extra client messages could be passed as AdditionalOutput, not sure about the extra server messages though.

The text was updated successfully, but these errors were encountered:

freddyaboulton · 2025-04-03T17:52:25Z

This is a great suggestion! I think changing the input audio messages to match input_audio_buffer.append and changing the output audio messages to match response.audio.delta is straightforward.

The tricky thing will be mapping the AdditionalOutputs of a handler to either the response.audio.transcript or input.audio.transcript events. Perhaps we just assume that if the output is {'role': "user", "content": ...} it corresponds to input.audio.transcript and same for the response.audio.transcript if the role is "assistant".

I'm not sure if we can support some events (like conversation.item.create) in the general sense. Some handlers may not even have a concept of a conversation.

vvolhejn · 2025-04-03T19:48:28Z

The tricky thing will be mapping the AdditionalOutputs of a handler to either the response.audio.transcript or input.audio.transcript events. Perhaps we just assume that if the output is {'role': "user", "content": ...} it corresponds to input.audio.transcript and same for the response.audio.transcript if the role is "assistant".

I think a better approach would be to have some subclass of AdditionalOutput that's more structured and allows you to specify that information. Perhaps even something like OpenAIRealtimeApiAdditionalOutput (a bit too long though) that would literally allow you to send specific OpenAI events. Since there is no semantics defined on how AdditionalOutput should be interpreted, this seems easy to do.

It should be pretty easy to then create a function to be used in additional_outputs_handler that would update a Chatbot element based on a received OpenAIRealtimeApiAdditionalOutput.

I'm not sure if we can support some events (like conversation.item.create) in the general sense. Some handlers may not even have a concept of a conversation.

That's a good point. I don't think it'd be possible to make every handler work as a drop-in OpenAI replacement, but it should be easy to create one if you want to. Ideally we would have some loud error if the client tries to send a message that's not supported, but we would need to somehow know what's supported and what isn't.

A good test of whether it works would be using the server as a replacement in the "OpenAI Realtime Console" demo (specifically, the websocket version): https://github.com/openai/openai-realtime-console?tab=readme-ov-file

I'll play around with this and try to create some standalone code that would wrap an AsyncStreamHandler and then we can see if it could be integrated into FastRTC?

freddyaboulton · 2025-04-03T22:10:33Z

I think a better approach would be to have some subclass of AdditionalOutput that's more structured

Yes I agree. Perhaps it can be called RealtimeMessage. And you're right that we can map an instance of RealtimeMessage to a chatbot UI update in a straightforward manner.

but it should be easy to create one if you want to. Ideally we would have some loud error if the client tries to send a message that's not supported, but we would need to somehow know what's supported and what isn't.

I think for now let us sidestep this.

I'll play around with this and try to create some standalone code that would wrap an AsyncStreamHandler and then we can see if it could be integrated into FastRTC?

Awesome really looking forward to seeing this!

marcusvaltonen · 2025-04-08T09:07:12Z

I would also be interested in having a custom (possibly subclass) of AdditionalOutputs, to which you can add some custom logic. E.g., an AdditionalJSONOutputs, where I can call json.dumps() directly on the onject and send the data in a stream to the client.

freddyaboulton mentioned this issue Apr 7, 2025

Add started_talking log message in ReplyOnPause and in api.md #260

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

OpenAI Realtime API compatibility? #245

OpenAI Realtime API compatibility? #245

vvolhejn commented Apr 3, 2025 •

edited

Loading

freddyaboulton commented Apr 3, 2025

Uh oh!

vvolhejn commented Apr 3, 2025

Uh oh!

freddyaboulton commented Apr 3, 2025

Uh oh!

marcusvaltonen commented Apr 8, 2025 •

edited

Loading

Uh oh!

OpenAI Realtime API compatibility? #245

OpenAI Realtime API compatibility? #245

Comments

vvolhejn commented Apr 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

freddyaboulton commented Apr 3, 2025

Uh oh!

vvolhejn commented Apr 3, 2025

Uh oh!

freddyaboulton commented Apr 3, 2025

Uh oh!

marcusvaltonen commented Apr 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vvolhejn commented Apr 3, 2025 •

edited

Loading

marcusvaltonen commented Apr 8, 2025 •

edited

Loading