|
5 | 5 | [](http://arxiv.org/abs/2505.13000)
|
6 | 6 | [](https://dualcodec.github.io/)
|
7 | 7 | [](https://pypi.org/project/dualcodec/)
|
8 |
| - |
9 |
| - |
| 8 | +[](https://github.com/jiaqili3/dualcodec) |
| 9 | +[](https://github.com/open-mmlab/Amphion/blob/main/models/codec/dualcodec/README.md) |
10 | 10 | [](https://colab.research.google.com/drive/1VvUhsDffLdY5TdNuaqlLnYzIoXhvI8MK#scrollTo=Lsos3BK4J-4E)
|
11 | 11 |
|
12 | 12 | ## About
|
@@ -125,11 +125,11 @@ This will launch an app that allows you to upload a wav file and get the output
|
125 | 125 | ## DualCodec-based TTS models
|
126 | 126 | Models available:
|
127 | 127 | - DualCodec-VALLE: A super fast 12.5Hz VALL-E TTS model based on DualCodec.
|
128 |
| -- DualCodec-Voicebox: A flow matching decoder for DualCodec 12.5Hz's semantic codes. |
| 128 | +- DualCodec-Voicebox: A flow matching decoder for DualCodec 12.5Hz's semantic codes. (this can be used as the second stage of tts). The component alone is not a TTS. |
129 | 129 |
|
130 | 130 | To continue, first install other necessary components for training:
|
131 | 131 | ```bash
|
132 |
| -pip install "dualcodec[train]" |
| 132 | +pip install "dualcodec[tts]" |
133 | 133 | ```
|
134 | 134 | Alternatively, if you want to install from source,
|
135 | 135 | ```bash
|
@@ -170,7 +170,11 @@ pip install -U wandb protobuf transformers
|
170 | 170 | ```bash
|
171 | 171 | pip install "dualcodec[tts]"
|
172 | 172 | ```
|
173 |
| -2. Clone this repository and `cd` to the project root folder (the folder that contains this readme). |
| 173 | +2. Clone this repository and `cd` to the project root folder (the folder that contains this readme): |
| 174 | +```bash |
| 175 | +git clone https://github.com/jiaqili3/DualCodec.git |
| 176 | +cd DualCodec |
| 177 | +``` |
174 | 178 |
|
175 | 179 | 3. To run example training on example Emilia German data:
|
176 | 180 | ```bash
|
@@ -228,4 +232,20 @@ data.segment_speech.segment_length=24000
|
228 | 232 | booktitle = {Proceedings of Interspeech 2025},
|
229 | 233 | year = {2025}
|
230 | 234 | }
|
| 235 | +``` |
| 236 | +If you use this with Amphion toolkit, please consider citing: |
| 237 | +```bibtex |
| 238 | +@article{amphion2, |
| 239 | + title = {Overview of the Amphion Toolkit (v0.2)}, |
| 240 | + author = {Jiaqi Li and Xueyao Zhang and Yuancheng Wang and Haorui He and Chaoren Wang and Li Wang and Huan Liao and Junyi Ao and Zeyu Xie and Yiqiao Huang and Junan Zhang and Zhizheng Wu}, |
| 241 | + year = {2025}, |
| 242 | + journal = {arXiv preprint arXiv:2501.15442}, |
| 243 | +} |
| 244 | +
|
| 245 | +@inproceedings{amphion, |
| 246 | + author={Xueyao Zhang and Liumeng Xue and Yicheng Gu and Yuancheng Wang and Jiaqi Li and Haorui He and Chaoren Wang and Ting Song and Xi Chen and Zihao Fang and Haopeng Chen and Junan Zhang and Tze Ying Tang and Lexiao Zou and Mingxuan Wang and Jun Han and Kai Chen and Haizhou Li and Zhizheng Wu}, |
| 247 | + title={Amphion: An Open-Source Audio, Music and Speech Generation Toolkit}, |
| 248 | + booktitle={{IEEE} Spoken Language Technology Workshop, {SLT} 2024}, |
| 249 | + year={2024} |
| 250 | +} |
231 | 251 | ```
|
0 commit comments