fish-speech-1.5

Go to file

xxl 2d0fff1d74 first commit		2025-01-02 10:25:56 +08:00
.gitattributes	Add .gitattributes	2025-01-02 10:15:15 +08:00
README.md	first commit	2025-01-02 10:25:56 +08:00
config.json	first commit	2025-01-02 10:25:56 +08:00
configuration.json	first commit	2025-01-02 10:25:56 +08:00
firefly-gan-vq-fsq-8x1024-21hz-generator.pth	first commit	2025-01-02 10:25:56 +08:00
model.pth	first commit	2025-01-02 10:25:56 +08:00
special_tokens.json	first commit	2025-01-02 10:25:56 +08:00
tokenizer.tiktoken	first commit	2025-01-02 10:25:56 +08:00

README.md

Fish Speech V1.5

Fish Speech V1.5 is a leading text-to-speech (TTS) model trained on more than 1 million hours of audio data in multiple languages.

Supported languages:

English (en) >300k hours
Chinese (zh) >300k hours
Japanese (ja) >100k hours
German (de) ~20k hours
French (fr) ~20k hours
Spanish (es) ~20k hours
Korean (ko) ~20k hours
Arabic (ar) ~20k hours
Russian (ru) ~20k hours
Dutch (nl) <10k hours
Italian (it) <10k hours
Polish (pl) <10k hours
Portuguese (pt) <10k hours

Please refer to Fish Speech Github for more info.
Demo available at Fish Audio.

Citation

If you found this repository useful, please consider citing this work:

@misc{fish-speech-v1.4,
      title={Fish-Speech: Leveraging Large Language Models for Advanced Multilingual Text-to-Speech Synthesis}, 
      author={Shijia Liao and Yuxuan Wang and Tianyu Li and Yifan Cheng and Ruoyi Zhang and Rongzhi Zhou and Yijin Xing},
      year={2024},
      eprint={2411.01156},
      archivePrefix={arXiv},
      primaryClass={cs.SD},
      url={https://arxiv.org/abs/2411.01156}, 
}

License

This model is permissively licensed under the BY-CC-NC-SA-4.0 license.