Python SDK azure-cognitiveservices-speech: SpeechSynthesizer fails with 401 when using EntraID and custom subdomain (identical SpeechConfig succeeds with SpeechRecognizer)

Referred to this as the correct place to raise this issue, from the [Azure SDK for Python Issue #42206](https://github.com/Azure/azure-sdk-for-python/issues/42206#issuecomment-3128158438)

- **Package Name:**: azure-cognitiveservices-speech
- **Package Version**: 1.45.0
- **Operating System**: Windows 11
- **Python Version**: 3.10.8

**Describe the bug**
When using the Python Speech SDK with:
- speech services of an Azure AI Services resource
- a custom subdomain
- Entra ID authentication via AzureDefaultCredential().  Az login is used; same results reproducable with AzureCliCredential() which is used in the reproducable code below for specificity
- Using an account that has Cognitive Services User permissions on the Azure AI Services resources

Speech to Text operations with SpeechRecognizer work just fine with SpeechConfig's credential and endpoint arguments populated.  Specifically tested with recognize_once_async().

Text to Speech operations with SpeechSynthesizer and an identical SpeechConfig fail for each method attempted.  The error received in result.cancellation_details.error_details is:

`WebSocket upgrade failed: Authentication error (401). Please check subscription information and region name. USP state: Sending. Received audio size: 0 bytes.`

**To Reproduce**
Steps to reproduce the behavior:

With an Azure AI Services resource with Entra ID authentication enabled and a custom subdomain named 'this':

1. Authenticate with az login: `az login --scope https://cognitiveservices.azure.com/.default`
2. Run the following code:

```python
import azure.cognitiveservices.speech as speechsdk
from azure.identity import AzureCliCredential

credential = AzureCliCredential()

subdomain = "this"
endpoint = f'https://{subdomain}.cognitiveservices.azure.com'
text = "Hello, this is a test"

speech_config = speechsdk.SpeechConfig(token_credential=credential, endpoint=endpoint)
speech_config.speech_synthesizer_language = "en-US"
speech_config.speech_synthesis_voice_name = 'en-US-AriaNeural'

audio_config = speechsdk.audio.AudioOutputConfig(use_default_speaker=True)

speech_synthesizer = speechsdk.SpeechSynthesizer(speech_config=speech_config, audio_config=audio_config)

result = speech_synthesizer.speak_text_async(text).get()

if result.reason == speechsdk.ResultReason.SynthesizingAudioCompleted:
    print(f"Speech synthesized for text {text}")
elif result.reason == speechsdk.ResultReason.Canceled:
    cancellation_details = result.cancellation_details
    print(f"Speech synthesis canceled: {cancellation_details.reason}")
    if cancellation_details.reason == speechsdk.CancellationReason.Error:
        if cancellation_details.error_details:
            print(f"Error details: {cancellation_details.error_details}")
```

The resulting error is:

WebSocket upgrade failed: Authentication error (401). Please check subscription information and region name. USP state: Sending. Received audio size: 0 bytes.

The same error occurs with every method I tried, including:
- get_voices_async()
- speak_text()
- speak_text_async()

**Expected behavior**
The following works just fine, using the same az login session, same source host running the script, and against the same Azure AI Services resource:
 
1. Authenticate with az login: `az login --scope https://cognitiveservices.azure.com/.default`
2. Run the following code:
 
```python
import azure.cognitiveservices.speech as speechsdk
from azure.identity import AzureCliCredential

credential = AzureCliCredential()

subdomain = "this"
endpoint = f'https://{subdomain}.cognitiveservices.azure.com'
filename = "Sample.wav"

speech_config = speechsdk.SpeechConfig(token_credential=credential, endpoint=endpoint)
speech_config.speech_recognition_language = "en-US"

audio_config = speechsdk.audio.AudioConfig(filename=filename)

speech_recognizer = speechsdk.SpeechRecognizer(speech_config=speech_config, audio_config=audio_config)

result = speech_recognizer.recognize_once_async().get()

if result.reason == speechsdk.ResultReason.RecognizedSpeech:
    print("Recognized: Text={}".format(result.text))
elif result.reason == speechsdk.ResultReason.NoMatch:
    print("No speech could be recognized: {}".format(result.no_match_details))
elif result.reason == speechsdk.ResultReason.Canceled:
    cancellation_details = result.cancellation_details
    print("Speech Recognition canceled: {}".format(cancellation_details.reason))
    if cancellation_details.reason == speechsdk.CancellationReason.Error:
        print("Error details: {}".format(cancellation_details.error_details))
```

This results in:
Recognized: Text=My sample text.

**Additional context**

This is blocking use of Entra ID with text to speech services.  Entra ID authentication is recommended by Microsoft (despite being grossly underrepresented in documentation and examples), and it would be great if it could be used consistently across services.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Python SDK azure-cognitiveservices-speech: SpeechSynthesizer fails with 401 when using EntraID and custom subdomain (identical SpeechConfig succeeds with SpeechRecognizer) #2890

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Python SDK azure-cognitiveservices-speech: SpeechSynthesizer fails with 401 when using EntraID and custom subdomain (identical SpeechConfig succeeds with SpeechRecognizer) #2890

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions