Skip to content

Latest commit

 

History

History
182 lines (163 loc) · 6 KB

text_to_pinyin.en.md

File metadata and controls

182 lines (163 loc) · 6 KB
comments
true

Tutorial for Text To Pinyin Module

I. Overview

Text to Pinyin is commonly used in the frontend of TTS to convert input Chinese text into a phonetic sequence with tones, providing pronunciation basis for subsequent acoustic models and audio generation.

II. Supported Model List

Model Download link Model size Introduction
G2PWModel G2PWModel 606M g2pW is an open-source text to pinyin model, commonly used in the frontend of TTS. It converts input Chinese text into a tonal Pinyin sequence, providing pronunciation basis for subsequent acoustic models and audio generation

III. Quick Integration

Before quick integration, you need to install the PaddleX wheel package. For the installation method, please refer to the PaddleX Local Installation Tutorial. After installing the wheel package, a few lines of code can complete the inference of the text to pinyin module. You can switch models under this module freely, and you can also integrate the model inference of the text to pinyin module into your project.

from paddlex import create_model
model = create_model(model_name="G2PWModel")
output = model.predict(input="欢迎使用飞桨", batch_size=1)
for res in output:
    res.print()
    res.save_to_json(save_path="./output/res.json")

After running, the result obtained is:

{'res': {'input_path': '欢迎使用飞桨', 'result': ['huan1', 'ying2', 'shi3', 'yong4', 'fei1', 'jiang3']}}

The meanings of the runtime parameters are as follows:

  • input_path: The storage path of the input text.
  • result: Pinyin converted from the input text.

Related methods, parameters, and explanations are as follows:

  • create_model for text to pinyin model, with specific explanations as follows:
Parameter Description Type Options Default Value
model_name The name of the model str G2PWModel G2PWModel
model_dir The storage path of the model str None None
  • The model_name must be specified. After specifying model_name, the built-in model parameters of PaddleX are used by default. If model_dir is specified, the user-defined model is used.

  • The predict() method of the text to pinyin model is called for inference and prediction. The parameters of the predict() method are input and batch_size, with specific explanations as follows:

Parameter Description Type Options Default Value
input Data to be predicted str
  • Input text, such as: 欢迎使用飞桨
None
batch_size Batch size int Currently only supports 1 1
  • The prediction results are processed as dict type for each sample and support the operation of saving as a json file:
Method Description Parameter Parameter Type Parameter Description Default Value
print() Print the result to the terminal format_json bool Whether to format the output content with JSON indentation True
indent int Specify the indentation level to beautify the output JSON data, making it more readable. This is only effective when format_json is True 4
ensure_ascii bool Control whether to escape non-ASCII characters to Unicode. When set to True, all non-ASCII characters will be escaped; False retains the original characters. This is only effective when format_json is True False
save_to_json() Save the result as a file in json format save_path str The file path for saving. When it is a directory, the saved file name will match the input file name None
indent int Specify the indentation level to beautify the output JSON data, making it more readable. This is only effective when format_json is True 4
ensure_ascii bool Control whether to escape non-ASCII characters to Unicode. When set to True, all non-ASCII characters will be escaped; False retains the original characters. This is only effective when format_json is True False
  • Additionally, the prediction results can also be obtained through attributes, as follows:
Attribute Description
json Get the prediction result in json format

For more information on using PaddleX's single-model inference APIs, please refer to the PaddleX Single-Model Python Script Usage Instructions.