comments
true

Tutorial for Text To Pinyin Module

I. Overview

Text to Pinyin is commonly used in the frontend of TTS to convert input Chinese text into a phonetic sequence with tones, providing pronunciation basis for subsequent acoustic models and audio generation.

II. Supported Model List

Model	Download link	Model size	Introduction
G2PWModel	G2PWModel	606M	g2pW is an open-source text to pinyin model, commonly used in the frontend of TTS. It converts input Chinese text into a tonal Pinyin sequence, providing pronunciation basis for subsequent acoustic models and audio generation

III. Quick Integration

Before quick integration, you need to install the PaddleX wheel package. For the installation method, please refer to the PaddleX Local Installation Tutorial. After installing the wheel package, a few lines of code can complete the inference of the text to pinyin module. You can switch models under this module freely, and you can also integrate the model inference of the text to pinyin module into your project.

from paddlex import create_model
model = create_model(model_name="G2PWModel")
output = model.predict(input="欢迎使用飞桨", batch_size=1)
for res in output:
    res.print()
    res.save_to_json(save_path="./output/res.json")

After running, the result obtained is:

{'res': {'input_path': '欢迎使用飞桨', 'result': ['huan1', 'ying2', 'shi3', 'yong4', 'fei1', 'jiang3']}}

The meanings of the runtime parameters are as follows:

input_path: The storage path of the input text.
result: Pinyin converted from the input text.

Related methods, parameters, and explanations are as follows:

create_model for text to pinyin model, with specific explanations as follows:

Parameter	Description	Type	Options	Default Value
`model_name`	The name of the model	`str`	`G2PWModel`	`G2PWModel`
`model_dir`	The storage path of the model	`str`	None	None

The model_name must be specified. After specifying model_name, the built-in model parameters of PaddleX are used by default. If model_dir is specified, the user-defined model is used.
The predict() method of the text to pinyin model is called for inference and prediction. The parameters of the predict() method are input and batch_size, with specific explanations as follows:

Parameter	Description	Type	Options	Default Value
`input`	Data to be predicted	`str`	Input text, such as: `欢迎使用飞桨`	None
`batch_size`	Batch size	`int`	Currently only supports 1	1

The prediction results are processed as dict type for each sample and support the operation of saving as a json file:

Method	Description	Parameter	Parameter Type	Parameter Description	Default Value
`print()`	Print the result to the terminal	`format_json`	`bool`	Whether to format the output content with `JSON` indentation	`True`
		`indent`	`int`	Specify the indentation level to beautify the output `JSON` data, making it more readable. This is only effective when `format_json` is `True`	4
		`ensure_ascii`	`bool`	Control whether to escape non-`ASCII` characters to `Unicode`. When set to `True`, all non-`ASCII` characters will be escaped; `False` retains the original characters. This is only effective when `format_json` is `True`	`False`
`save_to_json()`	Save the result as a file in `json` format	`save_path`	`str`	The file path for saving. When it is a directory, the saved file name will match the input file name	None
		`indent`	`int`	Specify the indentation level to beautify the output `JSON` data, making it more readable. This is only effective when `format_json` is `True`	4
		`ensure_ascii`	`bool`	Control whether to escape non-`ASCII` characters to `Unicode`. When set to `True`, all non-`ASCII` characters will be escaped; `False` retains the original characters. This is only effective when `format_json` is `True`	`False`

Additionally, the prediction results can also be obtained through attributes, as follows:

Attribute	Description
`json`	Get the prediction result in `json` format

For more information on using PaddleX's single-model inference APIs, please refer to the PaddleX Single-Model Python Script Usage Instructions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!