-
Notifications
You must be signed in to change notification settings - Fork 2k
Description
Referred to this as the correct place to raise this issue, from the Azure SDK for Python Issue #42206
- Package Name:: azure-cognitiveservices-speech
- Package Version: 1.45.0
- Operating System: Windows 11
- Python Version: 3.10.8
Describe the bug
When using the Python Speech SDK with:
- speech services of an Azure AI Services resource
- a custom subdomain
- Entra ID authentication via AzureDefaultCredential(). Az login is used; same results reproducable with AzureCliCredential() which is used in the reproducable code below for specificity
- Using an account that has Cognitive Services User permissions on the Azure AI Services resources
Speech to Text operations with SpeechRecognizer work just fine with SpeechConfig's credential and endpoint arguments populated. Specifically tested with recognize_once_async().
Text to Speech operations with SpeechSynthesizer and an identical SpeechConfig fail for each method attempted. The error received in result.cancellation_details.error_details is:
WebSocket upgrade failed: Authentication error (401). Please check subscription information and region name. USP state: Sending. Received audio size: 0 bytes.
To Reproduce
Steps to reproduce the behavior:
With an Azure AI Services resource with Entra ID authentication enabled and a custom subdomain named 'this':
- Authenticate with az login:
az login --scope https://cognitiveservices.azure.com/.default
- Run the following code:
import azure.cognitiveservices.speech as speechsdk
from azure.identity import AzureCliCredential
credential = AzureCliCredential()
subdomain = "this"
endpoint = f'https://{subdomain}.cognitiveservices.azure.com'
text = "Hello, this is a test"
speech_config = speechsdk.SpeechConfig(token_credential=credential, endpoint=endpoint)
speech_config.speech_synthesizer_language = "en-US"
speech_config.speech_synthesis_voice_name = 'en-US-AriaNeural'
audio_config = speechsdk.audio.AudioOutputConfig(use_default_speaker=True)
speech_synthesizer = speechsdk.SpeechSynthesizer(speech_config=speech_config, audio_config=audio_config)
result = speech_synthesizer.speak_text_async(text).get()
if result.reason == speechsdk.ResultReason.SynthesizingAudioCompleted:
print(f"Speech synthesized for text {text}")
elif result.reason == speechsdk.ResultReason.Canceled:
cancellation_details = result.cancellation_details
print(f"Speech synthesis canceled: {cancellation_details.reason}")
if cancellation_details.reason == speechsdk.CancellationReason.Error:
if cancellation_details.error_details:
print(f"Error details: {cancellation_details.error_details}")
The resulting error is:
WebSocket upgrade failed: Authentication error (401). Please check subscription information and region name. USP state: Sending. Received audio size: 0 bytes.
The same error occurs with every method I tried, including:
- get_voices_async()
- speak_text()
- speak_text_async()
Expected behavior
The following works just fine, using the same az login session, same source host running the script, and against the same Azure AI Services resource:
- Authenticate with az login:
az login --scope https://cognitiveservices.azure.com/.default
- Run the following code:
import azure.cognitiveservices.speech as speechsdk
from azure.identity import AzureCliCredential
credential = AzureCliCredential()
subdomain = "this"
endpoint = f'https://{subdomain}.cognitiveservices.azure.com'
filename = "Sample.wav"
speech_config = speechsdk.SpeechConfig(token_credential=credential, endpoint=endpoint)
speech_config.speech_recognition_language = "en-US"
audio_config = speechsdk.audio.AudioConfig(filename=filename)
speech_recognizer = speechsdk.SpeechRecognizer(speech_config=speech_config, audio_config=audio_config)
result = speech_recognizer.recognize_once_async().get()
if result.reason == speechsdk.ResultReason.RecognizedSpeech:
print("Recognized: Text={}".format(result.text))
elif result.reason == speechsdk.ResultReason.NoMatch:
print("No speech could be recognized: {}".format(result.no_match_details))
elif result.reason == speechsdk.ResultReason.Canceled:
cancellation_details = result.cancellation_details
print("Speech Recognition canceled: {}".format(cancellation_details.reason))
if cancellation_details.reason == speechsdk.CancellationReason.Error:
print("Error details: {}".format(cancellation_details.error_details))
This results in:
Recognized: Text=My sample text.
Additional context
This is blocking use of Entra ID with text to speech services. Entra ID authentication is recommended by Microsoft (despite being grossly underrepresented in documentation and examples), and it would be great if it could be used consistently across services.