Support SpeechT5 text-to-speech pipeline by OpenVINO #1230

rkazants · 2025-04-09T09:12:52Z

What does this PR do?

This PR introduces support of SpeechT5 text-to-speech pipeline using OpenVINO. Here is a demo code:

import soundfile as sf
import torch
from datasets import load_dataset
from optimum.intel import OVModelForTextToSpeechSeq2Seq
from transformers import SpeechT5Processor

model_id = "microsoft/speecht5_tts"
vocoder_id = "microsoft/speecht5_hifigan"

ov_pipe = OVModelForTextToSpeechSeq2Seq.from_pretrained(model_id, export=True, vocoder=vocoder_id)
ov_pipe.save_pretrained("speecht5_tts")
ov_pipe = OVModelForTextToSpeechSeq2Seq.from_pretrained("speecht5_tts")

processor = SpeechT5Processor.from_pretrained(model_id)

inputs = processor(text="Hello, this PR introduces support of SpeechT5 text-to-speech pipeline using OpenVINO.",
                   return_tensors="pt")

# load vector containing speaker's voice characteristics from a dataset
embeddings_dataset = load_dataset("Matthijs/cmu-arctic-xvectors", split="validation")
speaker_embeddings = torch.tensor(embeddings_dataset[7306]["xvector"]).unsqueeze(0)

speech = ov_pipe.generate(input_ids=inputs["input_ids"],
                          speaker_embeddings=speaker_embeddings)

sf.write("speech.wav", speech.numpy()[0], samplerate=16000)

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you make sure to update the documentation with your changes?
Did you write any new necessary tests?

Signed-off-by: Kazantsev, Roman <roman.kazantsev@intel.com>

eaidova · 2025-04-14T04:37:37Z

@rkazants could you please provide tests?

optimum/exporters/openvino/model_configs.py

optimum/exporters/openvino/model_patcher.py

optimum/intel/openvino/modeling_text2speech.py

eaidova · 2025-04-14T04:45:25Z

@rkazants please update also import structure for added model classes:
https://github.com/huggingface/optimum-intel/blob/main/optimum/intel/__init__.py#L163
https://github.com/huggingface/optimum-intel/blob/main/optimum/intel/utils/dummy_openvino_objects.py

…ech_speecht5_153160

Signed-off-by: Kazantsev, Roman <roman.kazantsev@intel.com>

rkazants · 2025-04-16T08:34:05Z

@rkazants please update also import structure for added model classes: https://github.com/huggingface/optimum-intel/blob/main/optimum/intel/__init__.py#L163 https://github.com/huggingface/optimum-intel/blob/main/optimum/intel/utils/dummy_openvino_objects.py

Updated

…ech_speecht5_153160

Signed-off-by: Kazantsev, Roman <roman.kazantsev@intel.com>

tests/openvino/utils_tests.py

HuggingFaceDocBuilderDev · 2025-04-18T06:18:10Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

…ech_speecht5_153160

Signed-off-by: Kazantsev, Roman <roman.kazantsev@intel.com>

rkazants · 2025-04-18T14:35:43Z

@rkazants could you please provide tests?

Done

rkazants · 2025-04-18T14:36:17Z

@eaidova, @IlyasMoutawwakil, @echarlaix, could you please review PR?

Thanks,
Roman

rkazants · 2025-04-18T15:56:42Z

CI failures does not relate to my changes in PR. For example, I see issue with whisper model. Please correct me if I am wrong.

…ech_speecht5_153160

optimum/intel/openvino/modeling_base_seq2seq.py

eaidova · 2025-04-21T13:19:38Z

@IlyasMoutawwakil could you please rerun ci, thanks

IlyasMoutawwakil · 2025-04-21T13:43:43Z

@eaidova done, was this issue fixed ?

FAILED tests/openvino/test_modeling.py::OVModelForTextToSpeechSeq2SeqIntegrationTest::test_compare_to_transformers_0_speecht5 - RuntimeError: The size of tensor a (512) must match the size of tensor b (32) at non-singleton dimension 1

eaidova · 2025-04-21T13:46:03Z

@eaidova done, was this issue fixed ?

FAILED tests/openvino/test_modeling.py::OVModelForTextToSpeechSeq2SeqIntegrationTest::test_compare_to_transformers_0_speecht5 - RuntimeError: The size of tensor a (512) must match the size of tensor b (32) at non-singleton dimension 1

@IlyasMoutawwakil thanks, yes @rkazants is working on the fix

optimum/intel/openvino/modeling_text2speech.py

optimum/exporters/openvino/model_configs.py

optimum/exporters/openvino/model_patcher.py

Co-authored-by: Ilyas Moutawwakil <57442720+IlyasMoutawwakil@users.noreply.github.com>

optimum/intel/openvino/modeling_text2speech.py

IlyasMoutawwakil

LGTM, great addition !
I think there's still some redundancy / room to make the implementation leaner, for example with Whisper which has a custom generate method, we only make sure our class is compliant with its behavior and use the method directly from transformers.

…ech_speecht5_153160

Signed-off-by: Kazantsev, Roman <roman.kazantsev@intel.com>

rkazants · 2025-04-28T06:28:51Z

LGTM, great addition ! I think there's still some redundancy / room to make the implementation leaner, for example with Whisper which has a custom generate method, we only make sure our class is compliant with its behavior and use the method directly from transformers.

Responded here #1230 (comment)
Thanks

Signed-off-by: Kazantsev, Roman <roman.kazantsev@intel.com>

echarlaix

left couple of minor comments, good to merge once resolved

optimum/intel/openvino/modeling_text2speech.py

Co-authored-by: Ella Charlaix <80481427+echarlaix@users.noreply.github.com>

…ech_speecht5_153160

optimum/exporters/openvino/model_patcher.py

Signed-off-by: Kazantsev, Roman <roman.kazantsev@intel.com>

optimum/exporters/openvino/model_patcher.py

Support SpeechT5 text-to-speech pipeline by OpenVINO

b3a2b12

Signed-off-by: Kazantsev, Roman <roman.kazantsev@intel.com>

eaidova reviewed Apr 14, 2025

View reviewed changes

optimum/exporters/openvino/model_configs.py Outdated Show resolved Hide resolved

eaidova reviewed Apr 14, 2025

View reviewed changes

optimum/exporters/openvino/model_configs.py Outdated Show resolved Hide resolved

eaidova reviewed Apr 14, 2025

View reviewed changes

optimum/exporters/openvino/model_patcher.py Show resolved Hide resolved

eaidova reviewed Apr 14, 2025

View reviewed changes

optimum/intel/openvino/modeling_text2speech.py Outdated Show resolved Hide resolved

rkazants added 4 commits April 15, 2025 18:43

Merge remote-tracking branch 'upstream/main' into ov_support_text2spe…

e89ffd1

…ech_speecht5_153160

Avoid is_postnet and is_vocoder vars in favour to use of behavior var

956d046

Signed-off-by: Kazantsev, Roman <roman.kazantsev@intel.com>

Remove unneeded comments

5940667

Signed-off-by: Kazantsev, Roman <roman.kazantsev@intel.com>

Added comments for patches and update init and utils

dc03206

Signed-off-by: Kazantsev, Roman <roman.kazantsev@intel.com>

rkazants added 2 commits April 17, 2025 19:55

Merge remote-tracking branch 'upstream/main' into ov_support_text2spe…

71a6f5d

…ech_speecht5_153160

Add integration tests for OVModelForTextToSpeechSeq2Seq

146e839

Signed-off-by: Kazantsev, Roman <roman.kazantsev@intel.com>

rkazants commented Apr 17, 2025

View reviewed changes

tests/openvino/utils_tests.py Outdated Show resolved Hide resolved

Update tests/openvino/utils_tests.py

77935a1

rkazants added 3 commits April 18, 2025 14:52

Merge remote-tracking branch 'upstream/main' into ov_support_text2spe…

430c840

…ech_speecht5_153160

Add test_export for TTS SpeechT5

56f6572

Signed-off-by: Kazantsev, Roman <roman.kazantsev@intel.com>

Add exporters_cli tests for TTS SpeechT5

4dbed56

Signed-off-by: Kazantsev, Roman <roman.kazantsev@intel.com>

rkazants requested a review from eaidova April 18, 2025 14:35

Merge remote-tracking branch 'upstream/main' into ov_support_text2spe…

fbbb727

…ech_speecht5_153160

rkazants commented Apr 21, 2025

View reviewed changes

optimum/intel/openvino/modeling_base_seq2seq.py Outdated Show resolved Hide resolved

Update optimum/intel/openvino/modeling_base_seq2seq.py

724c04d

echarlaix added the openvino-test Trigger OpenVINO slow tests label Apr 24, 2025

echarlaix approved these changes Apr 24, 2025

View reviewed changes

optimum/intel/openvino/modeling_text2speech.py Outdated Show resolved Hide resolved

optimum/intel/openvino/modeling_text2speech.py Show resolved Hide resolved

optimum/intel/openvino/modeling_text2speech.py Outdated Show resolved Hide resolved

IlyasMoutawwakil reviewed Apr 25, 2025

View reviewed changes

optimum/exporters/openvino/model_configs.py Outdated Show resolved Hide resolved

IlyasMoutawwakil reviewed Apr 25, 2025

View reviewed changes

optimum/exporters/openvino/model_patcher.py Show resolved Hide resolved

Update optimum/exporters/openvino/model_configs.py

e98f8f0

Co-authored-by: Ilyas Moutawwakil <57442720+IlyasMoutawwakil@users.noreply.github.com>

rkazants requested a review from IlyasMoutawwakil April 25, 2025 06:20

IlyasMoutawwakil reviewed Apr 25, 2025

View reviewed changes

optimum/intel/openvino/modeling_text2speech.py Outdated Show resolved Hide resolved

IlyasMoutawwakil approved these changes Apr 25, 2025

View reviewed changes

IlyasMoutawwakil removed the openvino-test Trigger OpenVINO slow tests label Apr 25, 2025

rkazants added 6 commits April 25, 2025 11:16

Merge remote-tracking branch 'upstream/main' into ov_support_text2spe…

ba66a65

…ech_speecht5_153160

Avoid _from_transformers custom impl

ab339bb

Signed-off-by: Kazantsev, Roman <roman.kazantsev@intel.com>

Remove submodule properties

e4fb1cf

Signed-off-by: Kazantsev, Roman <roman.kazantsev@intel.com>

Remove redundant methods from text2speech

e72a112

Signed-off-by: Kazantsev, Roman <roman.kazantsev@intel.com>

Add a link to original generate method

d165011

Signed-off-by: Kazantsev, Roman <roman.kazantsev@intel.com>

Create common interface for text-to-speech models using OpenVINO

9d46992

Signed-off-by: Kazantsev, Roman <roman.kazantsev@intel.com>

Fix export test

3cbc7cf

Signed-off-by: Kazantsev, Roman <roman.kazantsev@intel.com>

echarlaix approved these changes Apr 28, 2025

View reviewed changes

rkazants commented Apr 29, 2025

View reviewed changes

optimum/intel/openvino/modeling_text2speech.py Outdated Show resolved Hide resolved

rkazants and others added 3 commits April 29, 2025 11:30

Apply suggestions from code review

a4d8d5b

Co-authored-by: Ella Charlaix <80481427+echarlaix@users.noreply.github.com>

Apply suggestions from code review

d4fc57c

Co-authored-by: Ella Charlaix <80481427+echarlaix@users.noreply.github.com>

Merge remote-tracking branch 'upstream/main' into ov_support_text2spe…

a9a9131

…ech_speecht5_153160

rkazants commented Apr 29, 2025

View reviewed changes

optimum/exporters/openvino/model_patcher.py Show resolved Hide resolved

rkazants added 2 commits April 29, 2025 11:46

Update optimum/exporters/openvino/model_patcher.py

4288186

Fix code style

d2004d0

Signed-off-by: Kazantsev, Roman <roman.kazantsev@intel.com>

rkazants commented Apr 29, 2025

View reviewed changes

optimum/exporters/openvino/model_patcher.py Show resolved Hide resolved

Update optimum/exporters/openvino/model_patcher.py

7ca345d

eaidova approved these changes Apr 29, 2025

View reviewed changes

echarlaix approved these changes Apr 29, 2025

View reviewed changes

nikita-savelyevv merged commit 1949522 into huggingface:main Apr 29, 2025
16 of 18 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support SpeechT5 text-to-speech pipeline by OpenVINO #1230

Support SpeechT5 text-to-speech pipeline by OpenVINO #1230

rkazants commented Apr 9, 2025 •

edited

Loading

eaidova commented Apr 14, 2025

eaidova commented Apr 14, 2025 •

edited

Loading

rkazants commented Apr 16, 2025

HuggingFaceDocBuilderDev commented Apr 18, 2025

rkazants commented Apr 18, 2025

rkazants commented Apr 18, 2025

rkazants commented Apr 18, 2025 •

edited

Loading

eaidova commented Apr 21, 2025

IlyasMoutawwakil commented Apr 21, 2025

eaidova commented Apr 21, 2025

IlyasMoutawwakil left a comment •

edited

Loading

rkazants commented Apr 28, 2025

echarlaix left a comment

Support SpeechT5 text-to-speech pipeline by OpenVINO #1230

Support SpeechT5 text-to-speech pipeline by OpenVINO #1230

Conversation

rkazants commented Apr 9, 2025 • edited Loading

What does this PR do?

Before submitting

eaidova commented Apr 14, 2025

eaidova commented Apr 14, 2025 • edited Loading

rkazants commented Apr 16, 2025

HuggingFaceDocBuilderDev commented Apr 18, 2025

rkazants commented Apr 18, 2025

rkazants commented Apr 18, 2025

rkazants commented Apr 18, 2025 • edited Loading

eaidova commented Apr 21, 2025

IlyasMoutawwakil commented Apr 21, 2025

eaidova commented Apr 21, 2025

IlyasMoutawwakil left a comment • edited Loading

Choose a reason for hiding this comment

rkazants commented Apr 28, 2025

echarlaix left a comment

Choose a reason for hiding this comment

rkazants commented Apr 9, 2025 •

edited

Loading

eaidova commented Apr 14, 2025 •

edited

Loading

rkazants commented Apr 18, 2025 •

edited

Loading

IlyasMoutawwakil left a comment •

edited

Loading