-
Notifications
You must be signed in to change notification settings - Fork 5.2k
Description
I used whisper.cpp for transcribing speech to text for subtitles on word level. I used command whisper-cli.exe -m ggml-medium.bin -f audio.wav -l en -ojf -owts. In file first 20 seconds is empty. Program created audio.wav.json file. In first sentence is wrong timing.
Here is text from audio.wav.json:
"text": " All right, here we are in Sheepshead Bay.", "tokens": [ { "text": " All", "timestamps": { "from": "00:00:02,580", "to": "00:00:04,410" }, "offsets": { "from": 2580, "to": 4410 }, "id": 1057, "p": 0.730634, "t_dtw": -1 }, { "text": " right", "timestamps": { "from": "00:00:04,430", "to": "00:00:07,480" }, "offsets": { "from": 4430, "to": 7480 }, "id": 558, "p": 0.999352, "t_dtw": -1 }, { "text": ",", "timestamps": { "from": "00:00:07,480", "to": "00:00:08,700" }, "offsets": { "from": 7480, "to": 8700 }, "id": 11, "p": 0.845496, "t_dtw": -1 }, { "text": " here", "timestamps": { "from": "00:00:08,700", "to": "00:00:11,150" }, "offsets": { "from": 8700, "to": 11150 }, "id": 510, "p": 0.994734, "t_dtw": -1 },
Also all t_dtw tags are wrong.
Also timing of "That'll be convenient for someone made in Turkey" sentence is wrong.