[Feature Request] 2 modes for optimized results/ better quality : singing or speech

**Is your feature request related to a problem? Please describe.**
I'm mainly use RVC to voice different characters, if most of time it works well enough, in some cases like screams, breath, laughs or vocal fry, the algorithm kind of bug out and can't follow well, make it sound really weird.

**Describe the solution you'd like**
I'm aware some settings under the hood could be tweaked in order to get better results, however, theses settings aren't displayed to the user. It would be great if we could have some presets to select for inference and training, optimizing the quality of the results for speech or for singing. For example : male speech, female speech, children speech, male singing, female singing, etc. It could cover more accurately the vocal range of each character.

**Describe alternatives you've considered**
Right now, I found that using checkpoint fusion can help a tiny bit to extend the vocal range, however, the voice isn't faithful to the original anymore.

**Additional context**
If it's not possible, could we make a pre-trained or a separate breath/scream/laugh model that focuses only on that? then we can blend the "voice noises (like breath, etc.)" model with the speech model of the same character.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Feature Request] 2 modes for optimized results/ better quality : singing or speech #76

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

[Feature Request] 2 modes for optimized results/ better quality : singing or speech #76

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions