Skip to content

wrong timing in subtitles #3623

@frankglatte

Description

@frankglatte

audio.wav.json

I used whisper.cpp for transcribing speech to text for subtitles on word level. I used command whisper-cli.exe -m ggml-medium.bin -f audio.wav -l en -ojf -owts. In file first 20 seconds is empty. Program created audio.wav.json file. In first sentence is wrong timing.
Here is text from audio.wav.json:
"text": " All right, here we are in Sheepshead Bay.", "tokens": [ { "text": " All", "timestamps": { "from": "00:00:02,580", "to": "00:00:04,410" }, "offsets": { "from": 2580, "to": 4410 }, "id": 1057, "p": 0.730634, "t_dtw": -1 }, { "text": " right", "timestamps": { "from": "00:00:04,430", "to": "00:00:07,480" }, "offsets": { "from": 4430, "to": 7480 }, "id": 558, "p": 0.999352, "t_dtw": -1 }, { "text": ",", "timestamps": { "from": "00:00:07,480", "to": "00:00:08,700" }, "offsets": { "from": 7480, "to": 8700 }, "id": 11, "p": 0.845496, "t_dtw": -1 }, { "text": " here", "timestamps": { "from": "00:00:08,700", "to": "00:00:11,150" }, "offsets": { "from": 8700, "to": 11150 }, "id": 510, "p": 0.994734, "t_dtw": -1 },

Also all t_dtw tags are wrong.
Also timing of "That'll be convenient for someone made in Turkey" sentence is wrong.

audio.wav

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions