-
Notifications
You must be signed in to change notification settings - Fork 1k
add text to pinyin model as tts frontend #3506
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
zxcd
wants to merge
6
commits into
PaddlePaddle:develop
Choose a base branch
from
zxcd:fastspeech2
base: develop
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
Show all changes
6 commits
Select commit
Hold shift + click to select a range
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
182 changes: 182 additions & 0 deletions
182
docs/module_usage/tutorials/speech_modules/text_to_pinyin.en.md
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,182 @@ | ||
--- | ||
comments: true | ||
--- | ||
|
||
# Tutorial for Text To Pinyin Module | ||
|
||
## I. Overview | ||
Text to Pinyin is commonly used in the frontend of TTS to convert input Chinese text into a phonetic sequence with tones, providing pronunciation basis for subsequent acoustic models and audio generation. | ||
|
||
## II. Supported Model List | ||
|
||
<table> | ||
<tr> | ||
<th >Model</th> | ||
<th >Download link</th> | ||
<th >Model size</th> | ||
<th >Introduction</th> | ||
</tr> | ||
<tr> | ||
<td>G2PWModel</td> | ||
<td><a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0rc0/G2PWModel_1.1.tar">G2PWModel</a></td> | ||
<td>606M</td> | ||
<td rowspan="5"> g2pW is an open-source text to pinyin model, commonly used in the frontend of TTS. It converts input Chinese text into a tonal Pinyin sequence, providing pronunciation basis for subsequent acoustic models and audio generation</td> | ||
</tr> | ||
</table> | ||
|
||
## III. Quick Integration | ||
Before quick integration, you need to install the PaddleX wheel package. For the installation method, please refer to the [PaddleX Local Installation Tutorial](../../../installation/installation.en.md). After installing the wheel package, a few lines of code can complete the inference of the text to pinyin module. You can switch models under this module freely, and you can also integrate the model inference of the text to pinyin module into your project. | ||
|
||
|
||
```python | ||
from paddlex import create_model | ||
model = create_model(model_name="G2PWModel") | ||
output = model.predict(input="欢迎使用飞桨", batch_size=1) | ||
for res in output: | ||
res.print() | ||
res.save_to_json(save_path="./output/res.json") | ||
``` | ||
|
||
After running, the result obtained is: | ||
|
||
```bash | ||
{'res': {'input_path': '欢迎使用飞桨', 'result': ['huan1', 'ying2', 'shi3', 'yong4', 'fei1', 'jiang3']}} | ||
``` | ||
|
||
The meanings of the runtime parameters are as follows: | ||
- `input_path`: The storage path of the input text. | ||
- `result`: Pinyin converted from the input text. | ||
|
||
Related methods, parameters, and explanations are as follows: | ||
* `create_model` for text to pinyin model, with specific explanations as follows: | ||
<table> | ||
<thead> | ||
<tr> | ||
<th>Parameter</th> | ||
<th>Description</th> | ||
<th>Type</th> | ||
<th>Options</th> | ||
<th>Default Value</th> | ||
</tr> | ||
</thead> | ||
<tr> | ||
<td><code>model_name</code></td> | ||
<td>The name of the model</td> | ||
<td><code>str</code></td> | ||
<td><code>G2PWModel</code></td> | ||
<td><code>G2PWModel</code></td> | ||
</tr> | ||
<tr> | ||
<td><code>model_dir</code></td> | ||
<td>The storage path of the model</td> | ||
<td><code>str</code></td> | ||
<td>None</td> | ||
<td>None</td> | ||
</tr> | ||
</table> | ||
|
||
* The `model_name` must be specified. After specifying `model_name`, the built-in model parameters of PaddleX are used by default. If `model_dir` is specified, the user-defined model is used. | ||
|
||
* The `predict()` method of the text to pinyin model is called for inference and prediction. The parameters of the `predict()` method are `input` and `batch_size`, with specific explanations as follows: | ||
|
||
<table> | ||
<thead> | ||
<tr> | ||
<th>Parameter</th> | ||
<th>Description</th> | ||
<th>Type</th> | ||
<th>Options</th> | ||
<th>Default Value</th> | ||
</tr> | ||
</thead> | ||
<tr> | ||
<td><code>input</code></td> | ||
<td>Data to be predicted</td> | ||
<td><code>str</code></td> | ||
<td> | ||
<ul> | ||
<li><b>Input text</b>, such as: <code>欢迎使用飞桨</code></li> | ||
</ul> | ||
</td> | ||
<td>None</td> | ||
</tr> | ||
<tr> | ||
<td><code>batch_size</code></td> | ||
<td>Batch size</td> | ||
<td><code>int</code></td> | ||
<td>Currently only supports 1</td> | ||
<td>1</td> | ||
</tr> | ||
</table> | ||
|
||
* The prediction results are processed as `dict` type for each sample and support the operation of saving as a `json` file: | ||
|
||
<table> | ||
<thead> | ||
<tr> | ||
<th>Method</th> | ||
<th>Description</th> | ||
<th>Parameter</th> | ||
<th>Parameter Type</th> | ||
<th>Parameter Description</th> | ||
<th>Default Value</th> | ||
</tr> | ||
</thead> | ||
<tr> | ||
<td rowspan="3"><code>print()</code></td> | ||
<td rowspan="3">Print the result to the terminal</td> | ||
<td><code>format_json</code></td> | ||
<td><code>bool</code></td> | ||
<td>Whether to format the output content with <code>JSON</code> indentation</td> | ||
<td><code>True</code></td> | ||
</tr> | ||
<tr> | ||
<td><code>indent</code></td> | ||
<td><code>int</code></td> | ||
<td>Specify the indentation level to beautify the output <code>JSON</code> data, making it more readable. This is only effective when <code>format_json</code> is <code>True</code></td> | ||
<td>4</td> | ||
</tr> | ||
<tr> | ||
<td><code>ensure_ascii</code></td> | ||
<td><code>bool</code></td> | ||
<td>Control whether to escape non-<code>ASCII</code> characters to <code>Unicode</code>. When set to <code>True</code>, all non-<code>ASCII</code> characters will be escaped; <code>False</code> retains the original characters. This is only effective when <code>format_json</code> is <code>True</code></td> | ||
<td><code>False</code></td> | ||
</tr> | ||
<tr> | ||
<td rowspan="3"><code>save_to_json()</code></td> | ||
<td rowspan="3">Save the result as a file in <code>json</code> format</td> | ||
<td><code>save_path</code></td> | ||
<td><code>str</code></td> | ||
<td>The file path for saving. When it is a directory, the saved file name will match the input file name</td> | ||
<td>None</td> | ||
</tr> | ||
<tr> | ||
<td><code>indent</code></td> | ||
<td><code>int</code></td> | ||
<td>Specify the indentation level to beautify the output <code>JSON</code> data, making it more readable. This is only effective when <code>format_json</code> is <code>True</code></td> | ||
<td>4</td> | ||
</tr> | ||
<tr> | ||
<td><code>ensure_ascii</code></td> | ||
<td><code>bool</code></td> | ||
<td>Control whether to escape non-<code>ASCII</code> characters to <code>Unicode</code>. When set to <code>True</code>, all non-<code>ASCII</code> characters will be escaped; <code>False</code> retains the original characters. This is only effective when <code>format_json</code> is <code>True</code></td> | ||
<td><code>False</code></td> | ||
</tr> | ||
</table> | ||
|
||
* Additionally, the prediction results can also be obtained through attributes, as follows: | ||
|
||
<table> | ||
<thead> | ||
<tr> | ||
<th>Attribute</th> | ||
<th>Description</th> | ||
</tr> | ||
</thead> | ||
<tr> | ||
<td rowspan="1"><code>json</code></td> | ||
<td rowspan="1">Get the prediction result in <code>json</code> format</td> | ||
</tr> | ||
</table> | ||
|
||
For more information on using PaddleX's single-model inference APIs, please refer to the [PaddleX Single-Model Python Script Usage Instructions](../../instructions/model_python_API.en.md). |
179 changes: 179 additions & 0 deletions
179
docs/module_usage/tutorials/speech_modules/text_to_pinyin.md
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,179 @@ | ||
--- | ||
comments: true | ||
--- | ||
|
||
# 文本转拼音模块使用教程 | ||
|
||
## 一、概述 | ||
文本到拼音常用于语音合成的前端,将输入的中文文本转换为带声调的拼音序列,为后续的声学模型和模型生成提供发音依据。 | ||
|
||
## 二、支持模型列表 | ||
|
||
<table> | ||
<tr> | ||
<th >模型</th> | ||
<th >模型下载链接</th> | ||
<th >模型大小</th> | ||
<th >介绍</th> | ||
</tr> | ||
<tr> | ||
<td>G2PWModel</td> | ||
<td><a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0rc0/G2PWModel_1.1.tar">G2PWModel</a></td> | ||
<td>606M</td> | ||
<td rowspan="5"> g2pW 开源的文本到拼音模型,常用于语音合成的前端,将输入的中文文本转换为带声调的拼音序列,为后续的声学模型和模型生成提供发音依据</td> | ||
</tr> | ||
</table> | ||
|
||
## 三、快速集成 | ||
在快速集成前,首先需要安装 PaddleX 的 wheel 包,wheel的安装方式请参考[PaddleX本地安装教程](../../../installation/installation.md)。完成 wheel 包的安装后,几行代码即可完成文本转拼音模块的推理,可以任意切换该模块下的模型,您也可以将文本转拼音模块中的模型推理集成到您的项目中。 | ||
|
||
```python | ||
from paddlex import create_model | ||
model = create_model(model_name="G2PWModel") | ||
output = model.predict(input="欢迎使用飞桨", batch_size=1) | ||
for res in output: | ||
res.print() | ||
res.save_to_json(save_path="./output/res.json") | ||
``` | ||
运行后,得到的结果为: | ||
```bash | ||
{'res': {'input_path': '欢迎使用飞桨', 'result': ['huan1', 'ying2', 'shi3', 'yong4', 'fei1', 'jiang3']}} | ||
``` | ||
运行结果参数含义如下: | ||
- `input_path`: 输入文本 | ||
- `result`: 输入文本转换后的拼音 | ||
|
||
相关方法、参数等说明如下: | ||
* `create_model`文本转拼音模型,具体说明如下: | ||
<table> | ||
<thead> | ||
<tr> | ||
<th>参数</th> | ||
<th>参数说明</th> | ||
<th>参数类型</th> | ||
<th>可选项</th> | ||
<th>默认值</th> | ||
</tr> | ||
</thead> | ||
<tr> | ||
<td><code>model_name</code></td> | ||
<td>模型名称</td> | ||
<td><code>str</code></td> | ||
<td><code>G2PWModel</code></td> | ||
<td><code>G2PWModel</code></td> | ||
</tr> | ||
<tr> | ||
<td><code>model_dir</code></td> | ||
<td>模型存储路径</td> | ||
<td><code>str</code></td> | ||
<td>无</td> | ||
<td>无</td> | ||
</tr> | ||
</table> | ||
|
||
* 其中,`model_name` 必须指定,指定 `model_name` 后,默认使用 PaddleX 内置的模型参数,在此基础上,指定 `model_dir` 时,使用用户自定义的模型。 | ||
|
||
* 调用文本转拼音模型的 `predict()` 方法进行推理预测,`predict()` 方法参数有 `input` 和 `batch_size`,具体说明如下: | ||
|
||
<table> | ||
<thead> | ||
<tr> | ||
<th>参数</th> | ||
<th>参数说明</th> | ||
<th>参数类型</th> | ||
<th>可选项</th> | ||
<th>默认值</th> | ||
</tr> | ||
</thead> | ||
<tr> | ||
<td><code>input</code></td> | ||
<td>待预测数据</td> | ||
<td><code>str</code></td> | ||
<td> | ||
<ul> | ||
<li><b>对应文本</b>,如:<code>欢迎使用飞桨</code></li> | ||
</ul> | ||
</td> | ||
<td>无</td> | ||
</tr> | ||
<tr> | ||
<td><code>batch_size</code></td> | ||
<td>批大小</td> | ||
<td><code>int</code></td> | ||
<td>目前仅支持1</td> | ||
<td>1</td> | ||
</tr> | ||
</table> | ||
|
||
* 对预测结果进行处理,每个样本的预测结果均为对应的Result对象,支持保存为`json`文件的操作: | ||
|
||
<table> | ||
<thead> | ||
<tr> | ||
<th>方法</th> | ||
<th>方法说明</th> | ||
<th>参数</th> | ||
<th>参数类型</th> | ||
<th>参数说明</th> | ||
<th>默认值</th> | ||
</tr> | ||
</thead> | ||
<tr> | ||
<td rowspan = "3"><code>print()</code></td> | ||
<td rowspan = "3">打印结果到终端</td> | ||
<td><code>format_json</code></td> | ||
<td><code>bool</code></td> | ||
<td>是否对输出内容进行使用 <code>JSON</code> 缩进格式化</td> | ||
<td><code>True</code></td> | ||
</tr> | ||
<tr> | ||
<td><code>indent</code></td> | ||
<td><code>int</code></td> | ||
<td>指定缩进级别,以美化输出的 <code>JSON</code> 数据,使其更具可读性,仅当 <code>format_json</code> 为 <code>True</code> 时有效</td> | ||
<td>4</td> | ||
</tr> | ||
<tr> | ||
<td><code>ensure_ascii</code></td> | ||
<td><code>bool</code></td> | ||
<td>控制是否将非 <code>ASCII</code> 字符转义为 <code>Unicode</code>。设置为 <code>True</code> 时,所有非 <code>ASCII</code> 字符将被转义;<code>False</code> 则保留原始字符,仅当<code>format_json</code>为<code>True</code>时有效</td> | ||
<td><code>False</code></td> | ||
</tr> | ||
<tr> | ||
<td rowspan = "3"><code>save_to_json()</code></td> | ||
<td rowspan = "3">将结果保存为json格式的文件</td> | ||
<td><code>save_path</code></td> | ||
<td><code>str</code></td> | ||
<td>保存的文件路径,当为目录时,保存文件命名与输入文件类型命名一致</td> | ||
<td>无</td> | ||
</tr> | ||
<tr> | ||
<td><code>indent</code></td> | ||
<td><code>int</code></td> | ||
<td>指定缩进级别,以美化输出的 <code>JSON</code> 数据,使其更具可读性,仅当 <code>format_json</code> 为 <code>True</code> 时有效</td> | ||
<td>4</td> | ||
</tr> | ||
<tr> | ||
<td><code>ensure_ascii</code></td> | ||
<td><code>bool</code></td> | ||
<td>控制是否将非 <code>ASCII</code> 字符转义为 <code>Unicode</code>。设置为 <code>True</code> 时,所有非 <code>ASCII</code> 字符将被转义;<code>False</code> 则保留原始字符,仅当<code>format_json</code>为<code>True</code>时有效</td> | ||
<td><code>False</code></td> | ||
</tr> | ||
</table> | ||
|
||
* 此外,也支持通过属性获取预测结果,具体如下: | ||
|
||
<table> | ||
<thead> | ||
<tr> | ||
<th>属性</th> | ||
<th>属性说明</th> | ||
</tr> | ||
</thead> | ||
<tr> | ||
<td rowspan = "1"><code>json</code></td> | ||
<td rowspan = "1">获取预测的<code>json</code>格式的结果</td> | ||
</tr> | ||
|
||
</table> | ||
|
||
关于更多 PaddleX 的单模型推理的 API 的使用方法,可以参考[PaddleX单模型Python脚本使用说明](../../instructions/model_python_API.md)。 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,11 @@ | ||
Global: | ||
model: G2PWModel | ||
mode: predict # only support predict | ||
device: gpu:0 | ||
output: "output" | ||
|
||
Predict: | ||
batch_size: 1 | ||
input: "欢迎使用飞桨" | ||
kernel_option: | ||
run_mode: paddle |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里建议使用 txt文件路径,文字放到txt文件中。这样和其他模型保持统一。