LLaSA
https://github.com/zhenye234/LLaSA_trainingzhenye234/LLaSA_training
https://huggingface.co/blog/srinivasbilla/llasa-ttsThe SOTA Text-to-speech and Zero Shot Voice cloning model that no one knows about...
https://arxiv.org/abs/2502.04128Llasa: Scaling Train-Time and Inference-Time Compute for Llama-based Speech Synthesis
Llama 3.2にxcodec2をつけただけで、SOTAに近い性能のボイスクローン&TTSができるようになった
性能は良さそうなんだけどcc-by-nc-4.0で研究目的意外には基本使えないので使う際には留意が必要
https://huggingface.co/HKUSTAudio/Llasa-3BHKUSTAudio/Llasa-3B
https://huggingface.co/spaces/srinivasbilla/llasa-3b-ttsDemo
https://huggingface.co/HKUSTAudio/Llasa-8BHKUSTAudio/Llasa-8B
https://huggingface.co/spaces/srinivasbilla/llasa-8b-ttsDemo