Skip to content

Confidences/probabilities for Whisper results #335

@zacharygraber

Description

@zacharygraber

Hi friends 👋. Bumblebee is an amazing project, and I'm excited about the prospect of integrating it into my Phoenix LiveView web app.

Description of Problem

speech_to_text_whisper_chunk only supports the raw text, start time, and stop time for that chunk as outputs. There is nothing comparable to (or at least no easy way to replicate) the per-segment avg_logprob that the Python-native Whisper API gives you.

Opportunity Statement (example use case)

AI-generated transcripts are getting better, but still often need to be cleaned by a human if you want to use them in a professional or research setting. Human cleaning of transcripts can be performed much more efficiently if attention can be directed to the places where the model was the least confident with its solution.

For example, I'd like to use the confidences/probabilities to return transcripts to users in a .docx format, where tokens/segments with low confidence are highlighted.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions