Skip to content

Commit d6ef266

Browse files
authored
Update README.md
1 parent 338ab38 commit d6ef266

File tree

1 file changed

+68
-2
lines changed

1 file changed

+68
-2
lines changed

README.md

Lines changed: 68 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,2 +1,68 @@
1-
# f5-tts-swift
2-
Implementation of F5-TTS in Swift using MLX
1+
2+
# F5 TTS for Swift (WIP)
3+
4+
Implementation of [F5-TTS](https://arxiv.org/abs/2410.06885) in Swift, using the [MLX Swift]([https://github.com/ml-explore/mlx](https://github.com/ml-explore/mlx-swift)) framework.
5+
6+
You can listen to a [sample here](https://s3.amazonaws.com/lucasnewman.datasets/f5tts/sample.wav) that was generated in ~11 seconds on an M3 Max MacBook Pro.
7+
8+
See the [Python repository](https://github.com/lucasnewman/f5-tts-mlx) for additional details on the model architecture.
9+
This repository is based on the original Pytorch implementation available [here](https://github.com/SWivid/F5-TTS).
10+
11+
12+
## Installation
13+
14+
The `F5TTS` Swift package can be built and run from Xcode or SwiftPM.
15+
16+
A pretrained model is available [on Huggingface](https://hf.co/lucasnewman/f5-tts-mlx).
17+
18+
19+
## Usage
20+
21+
```swift
22+
import Vocos
23+
import F5TTS
24+
25+
let f5tts = try await F5TTS.fromPretrained(repoId: "lucasnewman/f5-tts-mlx")
26+
let vocos = try await Vocos.fromPretrained(repoId: "lucasnewman/vocos-mel-24khz-mlx") // if decoding to audio output
27+
28+
let inputAudio = MLXArray(...)
29+
30+
let (outputAudio, _) = f5tts.sample(
31+
cond: inputAudio,
32+
text: ["This is the caption for the reference audio and generation text."],
33+
duration: ...,
34+
vocoder: vocos.decode) { progress in
35+
print("Progress: \(Int(progress * 100))%")
36+
}
37+
```
38+
39+
## Appreciation
40+
41+
[Yushen Chen](https://github.com/SWivid) for the original Pytorch implementation of F5 TTS and pretrained model.
42+
43+
[Phil Wang](https://github.com/lucidrains) for the E2 TTS implementation that this model is based on.
44+
45+
## Citations
46+
47+
```bibtex
48+
@article{chen-etal-2024-f5tts,
49+
title={F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching},
50+
author={Yushen Chen and Zhikang Niu and Ziyang Ma and Keqi Deng and Chunhui Wang and Jian Zhao and Kai Yu and Xie Chen},
51+
journal={arXiv preprint arXiv:2410.06885},
52+
year={2024},
53+
}
54+
```
55+
56+
```bibtex
57+
@inproceedings{Eskimez2024E2TE,
58+
title = {E2 TTS: Embarrassingly Easy Fully Non-Autoregressive Zero-Shot TTS},
59+
author = {Sefik Emre Eskimez and Xiaofei Wang and Manthan Thakker and Canrun Li and Chung-Hsien Tsai and Zhen Xiao and Hemin Yang and Zirun Zhu and Min Tang and Xu Tan and Yanqing Liu and Sheng Zhao and Naoyuki Kanda},
60+
year = {2024},
61+
url = {https://api.semanticscholar.org/CorpusID:270738197}
62+
}
63+
```
64+
65+
## License
66+
67+
The code in this repository is released under the MIT license as found in the
68+
[LICENSE](LICENSE) file.

0 commit comments

Comments
 (0)