音声処理 - yuyan

音声処理

Espnet

End-to-End Speech Processing Toolkit

EspNet2

音声AIエージェントの世界とRetell AI入門

音声AIマーケットマップ

How Voice AI will change the world

speech-gateway

Samba-ASR: State-Of-The-Art Speech Recognition Leveraging Structured State-Space Models

MinMo: A Multimodal Large Language Model for Seamless Voice Interaction

第72回: 四千言語・百万時間の音声データを用いた自己教師あり学習の試み

https://www.youtube.com/watch?v=ESaitFF1iTs

オーディオ処理入門

DMOSpeech 2: Reinforcement Learning for Duration Prediction in Metric-Optimized Speech Synthesis

Building a multi-agent voice assistant with Amazon Nova Sonic and Amazon Bedrock AgentCore

sherpa-onnx