Skip to content

Proper Language attribute handling for SAPI5 voices #429

@gexgd0419

Description

@gexgd0419

The current code assumes that the attribute Language of SAPI5 voices only has one hexadecimal value:

# Retrieve and convert language code
language_attr = attr.GetAttribute("Language")
language_code = int(language_attr, 16)
primary_sub_code = f"{language_code & 0x3FF}-{(language_code >> 10) & 0x3FF}"
languages = [lcid_to_locale(primary_sub_code)]

However, SAPI5 allows multiple values separated by semicolons, so that a voice can put all supported language IDs here, or put neutral locale IDs as a fallback. The current code would fail if there are multiple values, such as 409;809.

SpGetLanguageFromToken in sphelper.h handles this by taking only the first value:

// ...
const WCHAR *pszLangId = dstrLanguage;
WCHAR achHexNumber [5]; // LangIds cannot be more than 4 characters longs

// check if the langid is in the format "409;9" - extract the "409" in this case
const WCHAR *pSemiColon = wcschr (pszLangId, L';');
if (pSemiColon)
{
    size_t cLen = pSemiColon - dstrLanguage;
    if (cLen <= 4)
    {
        StringCbCopyNW (achHexNumber, sizeof (achHexNumber), dstrLanguage, cLen * sizeof (achHexNumber [0]));
        pszLangId = achHexNumber;
    }
}
// ...

You can also list all languages in this case.

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions