音声処理
音声認識
音声合成
音声工学・音響工学
Speech 2 Speech
Multimodal Live Streaming
Espnet
https://github.com/espnet/espnet
End-to-End Speech Processing Toolkit
EspNet2
https://kan-bayashi.github.io/asj-espnet2-tutorial/
音声AIエージェントの世界とRetell AI入門
https://speakerdeck.com/rkaga/introduction-to-the-world-of-voice-ai-agents-and-retell-ai?slide=24
音声AIマーケットマップ
https://x.com/kubotamas/status/1856458329889091715
How Voice AI will change the world
https://elevenlabs.io/blog/babbage-the-economist
speech-gateway
https://github.com/uezo/speech-gateway
Samba-ASR: State-Of-The-Art Speech Recognition Leveraging Structured State-Space Models
https://arxiv.org/abs/2501.02832
MinMo: A Multimodal Large Language Model for Seamless Voice Interaction
https://arxiv.org/abs/2501.06282
第72回: 四千言語・百万時間の音声データを用いた自己教師あり学習の試み
https://www.youtube.com/watch?v=ESaitFF1iTs
オーディオ処理入門
https://speakerdeck.com/hotwatermorning/oteiochu-li-ru-men-hoisutiensiyawozuo-rou
DMOSpeech 2: Reinforcement Learning for Duration Prediction in Metric-Optimized Speech Synthesis
https://arxiv.org/abs/2507.14988