Skip to content

Commit c0a7cb4

Browse files
committed
update amphion citation and link in dualcodec readme. remove dependency of descript-audio-codec which can cause dependency conflict
1 parent c532b04 commit c0a7cb4

File tree

7 files changed

+86
-14
lines changed

7 files changed

+86
-14
lines changed

models/codec/dualcodec/README.md

Lines changed: 25 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -5,8 +5,8 @@
55
[![arXiv](https://img.shields.io/badge/arXiv-2505.13000-brightgreen.svg?style=flat-square)](http://arxiv.org/abs/2505.13000)
66
[![githubio](https://img.shields.io/badge/GitHub.io-Demo_Page-blue?logo=Github&style=flat-square)](https://dualcodec.github.io/)
77
[![PyPI](https://img.shields.io/pypi/v/dualcodec?color=blue&label=PyPI&logo=PyPI&style=flat-square)](https://pypi.org/project/dualcodec/)
8-
![GitHub](https://img.shields.io/badge/Github-Dev_Release-pink?logo=Github&style=flat-square)
9-
![Amphion](https://img.shields.io/badge/Amphion-Stable_Release-blue?style=flat-square)
8+
[![GitHub](https://img.shields.io/badge/Github-Dev_Release-pink?logo=Github&style=flat-square)](https://github.com/jiaqili3/dualcodec)
9+
[![Amphion](https://img.shields.io/badge/Amphion-Stable_Release-blue?style=flat-square)](https://github.com/open-mmlab/Amphion/blob/main/models/codec/dualcodec/README.md)
1010
[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1VvUhsDffLdY5TdNuaqlLnYzIoXhvI8MK#scrollTo=Lsos3BK4J-4E)
1111

1212
## About
@@ -125,11 +125,11 @@ This will launch an app that allows you to upload a wav file and get the output
125125
## DualCodec-based TTS models
126126
Models available:
127127
- DualCodec-VALLE: A super fast 12.5Hz VALL-E TTS model based on DualCodec.
128-
- DualCodec-Voicebox: A flow matching decoder for DualCodec 12.5Hz's semantic codes.
128+
- DualCodec-Voicebox: A flow matching decoder for DualCodec 12.5Hz's semantic codes. (this can be used as the second stage of tts). The component alone is not a TTS.
129129

130130
To continue, first install other necessary components for training:
131131
```bash
132-
pip install "dualcodec[train]"
132+
pip install "dualcodec[tts]"
133133
```
134134
Alternatively, if you want to install from source,
135135
```bash
@@ -170,7 +170,11 @@ pip install -U wandb protobuf transformers
170170
```bash
171171
pip install "dualcodec[tts]"
172172
```
173-
2. Clone this repository and `cd` to the project root folder (the folder that contains this readme).
173+
2. Clone this repository and `cd` to the project root folder (the folder that contains this readme):
174+
```bash
175+
git clone https://github.com/jiaqili3/DualCodec.git
176+
cd DualCodec
177+
```
174178

175179
3. To run example training on example Emilia German data:
176180
```bash
@@ -228,4 +232,20 @@ data.segment_speech.segment_length=24000
228232
booktitle = {Proceedings of Interspeech 2025},
229233
year = {2025}
230234
}
235+
```
236+
If you use this with Amphion toolkit, please consider citing:
237+
```bibtex
238+
@article{amphion2,
239+
title = {Overview of the Amphion Toolkit (v0.2)},
240+
author = {Jiaqi Li and Xueyao Zhang and Yuancheng Wang and Haorui He and Chaoren Wang and Li Wang and Huan Liao and Junyi Ao and Zeyu Xie and Yiqiao Huang and Junan Zhang and Zhizheng Wu},
241+
year = {2025},
242+
journal = {arXiv preprint arXiv:2501.15442},
243+
}
244+
245+
@inproceedings{amphion,
246+
author={Xueyao Zhang and Liumeng Xue and Yicheng Gu and Yuancheng Wang and Jiaqi Li and Haorui He and Chaoren Wang and Ting Song and Xi Chen and Zihao Fang and Haopeng Chen and Junan Zhang and Tze Ying Tang and Lexiao Zou and Mingxuan Wang and Jun Han and Kai Chen and Haizhou Li and Zhizheng Wu},
247+
title={Amphion: An Open-Source Audio, Music and Speech Generation Toolkit},
248+
booktitle={{IEEE} Spoken Language Technology Workshop, {SLT} 2024},
249+
year={2024}
250+
}
231251
```
Lines changed: 33 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,33 @@
1+
import numpy as np
2+
import torch
3+
import torch.nn as nn
4+
import torch.nn.functional as F
5+
from einops import rearrange
6+
from torch.nn.utils import weight_norm
7+
8+
9+
def WNConv1d(*args, **kwargs):
10+
return weight_norm(nn.Conv1d(*args, **kwargs))
11+
12+
13+
def WNConvTranspose1d(*args, **kwargs):
14+
return weight_norm(nn.ConvTranspose1d(*args, **kwargs))
15+
16+
17+
# Scripting this brings model speed up 1.4x
18+
@torch.jit.script
19+
def snake(x, alpha):
20+
shape = x.shape
21+
x = x.reshape(shape[0], shape[1], -1)
22+
x = x + (alpha + 1e-9).reciprocal() * torch.sin(alpha * x).pow(2)
23+
x = x.reshape(shape)
24+
return x
25+
26+
27+
class Snake1d(nn.Module):
28+
def __init__(self, channels):
29+
super().__init__()
30+
self.alpha = nn.Parameter(torch.ones(1, channels, 1))
31+
32+
def forward(self, x):
33+
return snake(x, self.alpha)

models/codec/dualcodec/dualcodec/model_codec/dac_model.py

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -15,9 +15,9 @@
1515
from audiotools.ml import BaseModel
1616
from torch import nn
1717

18-
from dac.nn.layers import Snake1d
19-
from dac.nn.layers import WNConv1d
20-
from dac.nn.layers import WNConvTranspose1d
18+
from .dac_layers import Snake1d
19+
from .dac_layers import WNConv1d
20+
from .dac_layers import WNConvTranspose1d
2121
from .dac_quantize import ResidualVectorQuantize
2222
from easydict import EasyDict as edict
2323
import torch.nn.functional as F

models/codec/dualcodec/dualcodec/model_codec/dac_quantize.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -17,7 +17,7 @@
1717
from torch.nn.utils import weight_norm
1818
except:
1919
from torch.nn.utils.parameterizations import weight_norm
20-
from dac.nn.layers import WNConv1d
20+
from .dac_layers import WNConv1d
2121

2222

2323
class VectorQuantize(nn.Module):

models/codec/dualcodec/dualcodec/model_codec/dualcodec_model.py

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -17,9 +17,9 @@
1717
from torch import nn
1818

1919
# from .base import CodecMixin
20-
from dac.nn.layers import Snake1d
21-
from dac.nn.layers import WNConv1d
22-
from dac.nn.layers import WNConvTranspose1d
20+
from .dac_layers import Snake1d
21+
from .dac_layers import WNConv1d
22+
from .dac_layers import WNConvTranspose1d
2323
from .dac_quantize import ResidualVectorQuantize
2424
from easydict import EasyDict as edict
2525
import torch.nn.functional as F

models/codec/dualcodec/pyproject.toml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,10 +1,10 @@
11
[project]
22
name = "dualcodec"
3-
version = "0.3.7"
3+
version = "0.4.0"
44
description = "The DualCodec neural audio codec."
55
dependencies = [
66
"transformers>=4.30.0",
7-
"descript-audio-codec",
7+
"descript-audiotools>=0.7.2",
88
"huggingface_hub[cli]",
99
"easydict",
1010
"torch",

models/codec/dualcodec/setup.py

Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,19 @@
1+
from setuptools import setup
2+
3+
setup(
4+
name="dualcodec",
5+
packages=["dualcodec"],
6+
install_requires=[
7+
"transformers>=4.30.0",
8+
"descript-audiotools>=0.7.2",
9+
"huggingface_hub[cli]",
10+
"easydict",
11+
"torch",
12+
"torchaudio",
13+
"hydra-core",
14+
"einops",
15+
"safetensors",
16+
"cached_path",
17+
],
18+
python_requires=">=3.9",
19+
)

0 commit comments

Comments
 (0)