Skip to content

add text to pinyin model as tts frontend #3506

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 6 commits into
base: develop
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
182 changes: 182 additions & 0 deletions docs/module_usage/tutorials/speech_modules/text_to_pinyin.en.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,182 @@
---
comments: true
---

# Tutorial for Text To Pinyin Module

## I. Overview
Text to Pinyin is commonly used in the frontend of TTS to convert input Chinese text into a phonetic sequence with tones, providing pronunciation basis for subsequent acoustic models and audio generation.

## II. Supported Model List

<table>
<tr>
<th >Model</th>
<th >Download link</th>
<th >Model size</th>
<th >Introduction</th>
</tr>
<tr>
<td>G2PWModel</td>
<td><a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0rc0/G2PWModel_1.1.tar">G2PWModel</a></td>
<td>606M</td>
<td rowspan="5"> g2pW is an open-source text to pinyin model, commonly used in the frontend of TTS. It converts input Chinese text into a tonal Pinyin sequence, providing pronunciation basis for subsequent acoustic models and audio generation</td>
</tr>
</table>

## III. Quick Integration
Before quick integration, you need to install the PaddleX wheel package. For the installation method, please refer to the [PaddleX Local Installation Tutorial](../../../installation/installation.en.md). After installing the wheel package, a few lines of code can complete the inference of the text to pinyin module. You can switch models under this module freely, and you can also integrate the model inference of the text to pinyin module into your project.


```python
from paddlex import create_model
model = create_model(model_name="G2PWModel")
output = model.predict(input="欢迎使用飞桨", batch_size=1)
for res in output:
res.print()
res.save_to_json(save_path="./output/res.json")
```

After running, the result obtained is:

```bash
{'res': {'input_path': '欢迎使用飞桨', 'result': ['huan1', 'ying2', 'shi3', 'yong4', 'fei1', 'jiang3']}}
```

The meanings of the runtime parameters are as follows:
- `input_path`: The storage path of the input text.
- `result`: Pinyin converted from the input text.

Related methods, parameters, and explanations are as follows:
* `create_model` for text to pinyin model, with specific explanations as follows:
<table>
<thead>
<tr>
<th>Parameter</th>
<th>Description</th>
<th>Type</th>
<th>Options</th>
<th>Default Value</th>
</tr>
</thead>
<tr>
<td><code>model_name</code></td>
<td>The name of the model</td>
<td><code>str</code></td>
<td><code>G2PWModel</code></td>
<td><code>G2PWModel</code></td>
</tr>
<tr>
<td><code>model_dir</code></td>
<td>The storage path of the model</td>
<td><code>str</code></td>
<td>None</td>
<td>None</td>
</tr>
</table>

* The `model_name` must be specified. After specifying `model_name`, the built-in model parameters of PaddleX are used by default. If `model_dir` is specified, the user-defined model is used.

* The `predict()` method of the text to pinyin model is called for inference and prediction. The parameters of the `predict()` method are `input` and `batch_size`, with specific explanations as follows:

<table>
<thead>
<tr>
<th>Parameter</th>
<th>Description</th>
<th>Type</th>
<th>Options</th>
<th>Default Value</th>
</tr>
</thead>
<tr>
<td><code>input</code></td>
<td>Data to be predicted</td>
<td><code>str</code></td>
<td>
<ul>
<li><b>Input text</b>, such as: <code>欢迎使用飞桨</code></li>
</ul>
</td>
<td>None</td>
</tr>
<tr>
<td><code>batch_size</code></td>
<td>Batch size</td>
<td><code>int</code></td>
<td>Currently only supports 1</td>
<td>1</td>
</tr>
</table>

* The prediction results are processed as `dict` type for each sample and support the operation of saving as a `json` file:

<table>
<thead>
<tr>
<th>Method</th>
<th>Description</th>
<th>Parameter</th>
<th>Parameter Type</th>
<th>Parameter Description</th>
<th>Default Value</th>
</tr>
</thead>
<tr>
<td rowspan="3"><code>print()</code></td>
<td rowspan="3">Print the result to the terminal</td>
<td><code>format_json</code></td>
<td><code>bool</code></td>
<td>Whether to format the output content with <code>JSON</code> indentation</td>
<td><code>True</code></td>
</tr>
<tr>
<td><code>indent</code></td>
<td><code>int</code></td>
<td>Specify the indentation level to beautify the output <code>JSON</code> data, making it more readable. This is only effective when <code>format_json</code> is <code>True</code></td>
<td>4</td>
</tr>
<tr>
<td><code>ensure_ascii</code></td>
<td><code>bool</code></td>
<td>Control whether to escape non-<code>ASCII</code> characters to <code>Unicode</code>. When set to <code>True</code>, all non-<code>ASCII</code> characters will be escaped; <code>False</code> retains the original characters. This is only effective when <code>format_json</code> is <code>True</code></td>
<td><code>False</code></td>
</tr>
<tr>
<td rowspan="3"><code>save_to_json()</code></td>
<td rowspan="3">Save the result as a file in <code>json</code> format</td>
<td><code>save_path</code></td>
<td><code>str</code></td>
<td>The file path for saving. When it is a directory, the saved file name will match the input file name</td>
<td>None</td>
</tr>
<tr>
<td><code>indent</code></td>
<td><code>int</code></td>
<td>Specify the indentation level to beautify the output <code>JSON</code> data, making it more readable. This is only effective when <code>format_json</code> is <code>True</code></td>
<td>4</td>
</tr>
<tr>
<td><code>ensure_ascii</code></td>
<td><code>bool</code></td>
<td>Control whether to escape non-<code>ASCII</code> characters to <code>Unicode</code>. When set to <code>True</code>, all non-<code>ASCII</code> characters will be escaped; <code>False</code> retains the original characters. This is only effective when <code>format_json</code> is <code>True</code></td>
<td><code>False</code></td>
</tr>
</table>

* Additionally, the prediction results can also be obtained through attributes, as follows:

<table>
<thead>
<tr>
<th>Attribute</th>
<th>Description</th>
</tr>
</thead>
<tr>
<td rowspan="1"><code>json</code></td>
<td rowspan="1">Get the prediction result in <code>json</code> format</td>
</tr>
</table>

For more information on using PaddleX's single-model inference APIs, please refer to the [PaddleX Single-Model Python Script Usage Instructions](../../instructions/model_python_API.en.md).
179 changes: 179 additions & 0 deletions docs/module_usage/tutorials/speech_modules/text_to_pinyin.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,179 @@
---
comments: true
---

# 文本转拼音模块使用教程

## 一、概述
文本到拼音常用于语音合成的前端,将输入的中文文本转换为带声调的拼音序列,为后续的声学模型和模型生成提供发音依据。

## 二、支持模型列表

<table>
<tr>
<th >模型</th>
<th >模型下载链接</th>
<th >模型大小</th>
<th >介绍</th>
</tr>
<tr>
<td>G2PWModel</td>
<td><a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0rc0/G2PWModel_1.1.tar">G2PWModel</a></td>
<td>606M</td>
<td rowspan="5"> g2pW 开源的文本到拼音模型,常用于语音合成的前端,将输入的中文文本转换为带声调的拼音序列,为后续的声学模型和模型生成提供发音依据</td>
</tr>
</table>

## 三、快速集成
在快速集成前,首先需要安装 PaddleX 的 wheel 包,wheel的安装方式请参考[PaddleX本地安装教程](../../../installation/installation.md)。完成 wheel 包的安装后,几行代码即可完成文本转拼音模块的推理,可以任意切换该模块下的模型,您也可以将文本转拼音模块中的模型推理集成到您的项目中。

```python
from paddlex import create_model
model = create_model(model_name="G2PWModel")
output = model.predict(input="欢迎使用飞桨", batch_size=1)
for res in output:
res.print()
res.save_to_json(save_path="./output/res.json")
```
运行后,得到的结果为:
```bash
{'res': {'input_path': '欢迎使用飞桨', 'result': ['huan1', 'ying2', 'shi3', 'yong4', 'fei1', 'jiang3']}}
```
运行结果参数含义如下:
- `input_path`: 输入文本
- `result`: 输入文本转换后的拼音

相关方法、参数等说明如下:
* `create_model`文本转拼音模型,具体说明如下:
<table>
<thead>
<tr>
<th>参数</th>
<th>参数说明</th>
<th>参数类型</th>
<th>可选项</th>
<th>默认值</th>
</tr>
</thead>
<tr>
<td><code>model_name</code></td>
<td>模型名称</td>
<td><code>str</code></td>
<td><code>G2PWModel</code></td>
<td><code>G2PWModel</code></td>
</tr>
<tr>
<td><code>model_dir</code></td>
<td>模型存储路径</td>
<td><code>str</code></td>
<td>无</td>
<td>无</td>
</tr>
</table>

* 其中,`model_name` 必须指定,指定 `model_name` 后,默认使用 PaddleX 内置的模型参数,在此基础上,指定 `model_dir` 时,使用用户自定义的模型。

* 调用文本转拼音模型的 `predict()` 方法进行推理预测,`predict()` 方法参数有 `input` 和 `batch_size`,具体说明如下:

<table>
<thead>
<tr>
<th>参数</th>
<th>参数说明</th>
<th>参数类型</th>
<th>可选项</th>
<th>默认值</th>
</tr>
</thead>
<tr>
<td><code>input</code></td>
<td>待预测数据</td>
<td><code>str</code></td>
<td>
<ul>
<li><b>对应文本</b>,如:<code>欢迎使用飞桨</code></li>
</ul>
</td>
<td>无</td>
</tr>
<tr>
<td><code>batch_size</code></td>
<td>批大小</td>
<td><code>int</code></td>
<td>目前仅支持1</td>
<td>1</td>
</tr>
</table>

* 对预测结果进行处理,每个样本的预测结果均为对应的Result对象,支持保存为`json`文件的操作:

<table>
<thead>
<tr>
<th>方法</th>
<th>方法说明</th>
<th>参数</th>
<th>参数类型</th>
<th>参数说明</th>
<th>默认值</th>
</tr>
</thead>
<tr>
<td rowspan = "3"><code>print()</code></td>
<td rowspan = "3">打印结果到终端</td>
<td><code>format_json</code></td>
<td><code>bool</code></td>
<td>是否对输出内容进行使用 <code>JSON</code> 缩进格式化</td>
<td><code>True</code></td>
</tr>
<tr>
<td><code>indent</code></td>
<td><code>int</code></td>
<td>指定缩进级别,以美化输出的 <code>JSON</code> 数据,使其更具可读性,仅当 <code>format_json</code> 为 <code>True</code> 时有效</td>
<td>4</td>
</tr>
<tr>
<td><code>ensure_ascii</code></td>
<td><code>bool</code></td>
<td>控制是否将非 <code>ASCII</code> 字符转义为 <code>Unicode</code>。设置为 <code>True</code> 时,所有非 <code>ASCII</code> 字符将被转义;<code>False</code> 则保留原始字符,仅当<code>format_json</code>为<code>True</code>时有效</td>
<td><code>False</code></td>
</tr>
<tr>
<td rowspan = "3"><code>save_to_json()</code></td>
<td rowspan = "3">将结果保存为json格式的文件</td>
<td><code>save_path</code></td>
<td><code>str</code></td>
<td>保存的文件路径,当为目录时,保存文件命名与输入文件类型命名一致</td>
<td>无</td>
</tr>
<tr>
<td><code>indent</code></td>
<td><code>int</code></td>
<td>指定缩进级别,以美化输出的 <code>JSON</code> 数据,使其更具可读性,仅当 <code>format_json</code> 为 <code>True</code> 时有效</td>
<td>4</td>
</tr>
<tr>
<td><code>ensure_ascii</code></td>
<td><code>bool</code></td>
<td>控制是否将非 <code>ASCII</code> 字符转义为 <code>Unicode</code>。设置为 <code>True</code> 时,所有非 <code>ASCII</code> 字符将被转义;<code>False</code> 则保留原始字符,仅当<code>format_json</code>为<code>True</code>时有效</td>
<td><code>False</code></td>
</tr>
</table>

* 此外,也支持通过属性获取预测结果,具体如下:

<table>
<thead>
<tr>
<th>属性</th>
<th>属性说明</th>
</tr>
</thead>
<tr>
<td rowspan = "1"><code>json</code></td>
<td rowspan = "1">获取预测的<code>json</code>格式的结果</td>
</tr>

</table>

关于更多 PaddleX 的单模型推理的 API 的使用方法,可以参考[PaddleX单模型Python脚本使用说明](../../instructions/model_python_API.md)。
11 changes: 11 additions & 0 deletions paddlex/configs/modules/text_to_pinyin/G2PWModel.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
Global:
model: G2PWModel
mode: predict # only support predict
device: gpu:0
output: "output"

Predict:
batch_size: 1
input: "欢迎使用飞桨"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里建议使用 txt文件路径,文字放到txt文件中。这样和其他模型保持统一。

kernel_option:
run_mode: paddle
1 change: 1 addition & 0 deletions paddlex/inference/common/batch_sampler/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -18,3 +18,4 @@
from .ts_batch_sampler import TSBatchSampler
from .det_3d_batch_sampler import Det3DBatchSampler
from .audio_batch_sampler import AudioBatchSampler
from .text_batch_sampler import TextBatchSampler
Loading