J-Moshi - NISHIO Hirokazu's Scrapbox (Auto-translated from Japanese)

J-Moshi

atsumoto_ohashi Japanese real-time spoken dialogue model J-Moshi is now available!

Based on @kyutai_labs' Moshi, it "speaks" and "listens" at the same time like a human.

This is the first model available in Japanese.

Model size is 7B and lightweight, please try it!

Will be presenting at #NLP2025.

https://nu-dialogue.github.io/j-moshi/

takahiroanno The aisle, filler, and cut-ins are so natural and amazing,..!　And this is done in a small 7B!

If we run this on a local machine, we might be able to organize tasks by voice without relying on Advanced Voice.

I'm not sure when this happened, but Advanced Voice now disconnects after about 5 minutes if I leave it on silent, so I can no longer leave it on when I'm working on my computer and use my voice to control it when I think of something.

nu-dialogue/j-moshi: J-Moshi: A Japanese Full-duplex Spoken Dialogue System

J-Moshi is in the prototype stage, and its responses may be unnatural. In addition, since most of J-Moshi's training data is chat dialogues, it cannot generate responses according to the user's instructions.

Uh, I wonder how much this will affect them.

https://x.com/akkikiki/status/1882913953749287288?s=46&t=gkSZtjGEtUZPO0JCzBxCBw

A story about converting it to work on a Mac

2025-01-25

INSTALLING

move as fast as one can

Still in Japanese and free of charge at this time.

Looks like a kukancho.

Not yet able to do useful tasks in conversation

Cannot generate a response according to the user's instructions.". Which means.

---

This page is auto-translated from /nishio/J-Moshi using DeepL. If you looks something interesting but the auto-translated English is not good enough to understand it, feel free to let me know at @nishio_en. I'm very happy to spread my thought to non-Japanese readers.