IPA output?
#1019
Replies: 1 comment 1 reply
-
Neural IPA would be cool but AFAIK nobody has done it. |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I'm assuming Whisper goes from speech directly to text? Is there any option to generate a textual representation of the phonemes or allophones first, i.e. in IPA representation (similar to the Allosaurus project ( https://github.com/xinjli/allosaurus ). I've been working on spelling correction from phonetic representations of mis-spelled words, and I'ld like to try adapting it to convert transcribed speech to text (though only for personal interest at this point, as your speech to text is incredibly good compared to anything I've seen in the past so I'm not sure there's much left to do!)
Graham
PS Later note... I was incredibly lucky with the first speech sample I tested with, when I wrote that note above - it was close to 100% accurate (it just missed a proper name - Dustin Sekula Library was recorded as "Dustin's secular library" :-) ) Subsequent transcriptions of real data (i.e. my own speech, or listening to the television) have been significantly worse. Perhaps the reason that first conversion was near perfect was because it was an answering machine recording of an automated alert system read out by a synthesized voice... Anyway I mention this just to say that there clearly is some point to my doing some experiments with my IPA to text code! ... if an IPA transcription of the speech is any good that is, The one in Allosaurus was not, even allowing for a phonetic similarity distance metric in the word reconstruction.
Beta Was this translation helpful? Give feedback.
All reactions