音声合成 - yuyan

音声合成

メモ、音声合成の国際学会

・INTERSPEECH 2022

https://www.interspeech2022.org/

・ICASSP2022

https://ieeexplore.ieee.org/xpl/conhome/9745891/proceeding

VALL-E

https://valle-demo.github.io/

【注目論文】Sinusoidal Frequency Estimation by Gradient Descent

https://qiita.com/xiao_ming/items/37303e1a5a8c26ff5088

coefont

https://coefont.cloud/editor/37f205d1-5fe2-4ae1-94f3-78f337c31c80

最先端のAI音声変換技術を用いて、誰の声でもリアルタイムに自分以外の人の声色に変換できるソフトウエアを開発

https://twitter.com/nikkeibusiness/status/1633973473537318915?s=20

音声合成エンジン

https://twitter.com/hikari_prituber/status/1631611534471479297?s=20

リアルな人間っぽい合成音声を生成するAI　「えー」「あぁ」「うん」なども再現　YouTubeやPodcastで学習

https://twitter.com/itmedia_news/status/1637591041535787008?s=20

https://twitter.com/forthshinji/status/1632231951670317056?s=20

来週のNLC研究会で「自由記述文による声質制御に向けたin-the-wild文データ収集法」を発表します．

合成音声の声質も自然言語(論文では日本語)で指示できるようにしよう，そのためにクソデカデータセットを作ろう，という話です．(去年の6月くらいから集めててやっと終わった)

https://twitter.com/kajikent/status/1637707767669743617?s=20

AssemblyAIがものすごい精度の音声認識AI出してきた。

OpenAIのWhisperもえげつない精度の高さだったが、動画内の最後のチャートにあるように全てのジャンルでWhisperよりも認識精度が高い。

ここまで精度上がってくるとVUIの普及も一気に加速しそう。

ChatGPTに声を与えてみる（ESPNet）

https://tech.isid.co.jp/entry/chatgpt_text_to_speech

coeiroink

https://coeiroink.com/

https://colab.research.google.com/drive/1BqaB-Zv5RuaQp-OW0effsFVGCYwvaJ4R?usp=sharing#scrollTo=pwNtwuqoKytj

WhisperをFineTuningして専門用語を認識可能にする

https://medium.com/axinc/whisperをfine-tuningして専門用語を認識可能にする-3744e2779c71

Hugging FaceでOpenAIの音声認識”Whisper”をFine Tuningする方法が公開されました

https://dev.classmethod.jp/articles/whisper-fine-tuning-by-huggingface/

OpenAI の Whisper を、自前の音声データで Fine Tuning するプログラム

https://qiita.com/toshiouchi/items/a6c439ea0271bedb2e2f

音声認識（文字起こし）の精度をより向上するにはどうしたら良いですか？

https://faq.gijiroku.ai/hc/ja/articles/4410245832601-音声認識-文字起こし-の精度をより向上するにはどうしたら良いですか-

CPUで高速動作可能なニューラルネットを用いた高品質テキスト音声合成技術

https://www.keihanna-fair.jp/2021/exhibition/wp-content/uploads/2021/10/panel.pdf

text2speech speech2text

https://deepgram.com/

amazon poly

webspeechAPI

whisper

AWS Transbribe

google speech-to-text

Parler-TTS

https://huggingface.co/spaces/freddyaboulton/parler-tts-streaming-webrtc

#13: 最近のTTSについて語る〜APIサービスから音声モデル作成まで〜

https://listen.style/p/aiengineeringnow/nivntyfu

voicevox

https://voicevox.hiroshiba.jp/

avis-project

https://github.com/Aivis-Project

https://aivis-project.com/

にじボイス

https://nijivoice.com/

speech-audio-proccessing

https://github.com/takamichi-lab/speech-audio-proccessing