H3 - work4ai

https://arxiv.org/abs/2212.14052https://github.com/HazyResearch/H3

Transformersに変わる新しいモデル

Attention is all you need... but how much of it do you need?

Announcing H3 - a new generative language models that outperforms GPT-Neo-2.7B with only *2* attention layers! Accepted as a *spotlight* at #ICLR2023! 📣 w/ @tri_dao

📜 https://t.co/vKdOTCH8Lk 1/n Dan Fu

// Podcast `#2: Hungry Hungry Hippos (H3) //

Stanford researchers just released a new architecture that:

- Beats Transformers at ~1B param scale

- Admits *much* longer context than Transformers

Is H3 the Transformer-killer? More below!

Spotify: https://t.co/URjW4PRdl8

1/5 AI Pub

https://gyazo.com/155fd3e1ffcf0f23398da93493668975https://gyazo.com/321936b2896e7002a1826924ece5f497https://gyazo.com/ebc7e45c83665711c41b60e011c61aaahttps://gyazo.com/b08332cd5d578780625a2c67a22c4e42

Hungry Hungry Hippos, aka "H3", functions like a linear RNN, or a long convolution.

The key idea: due to the fast Fourier transform, an H3 layer:

- can be computed in n*log(n) time, with n the context length

- unlike Transformers, which require n^2!

2/5 AI Pub

https://gyazo.com/b08332cd5d578780625a2c67a22c4e42

transformerの計算量は$ O(n^2)だったのが$ O(nlog_2n)まで減らせる

つまり、1000トークン入力した時に、Transformerだと100万オーダーまで計算量が増えてしまうところが、H3ならたったの1万オーダーで済む。メチャメチャ計算量が減る。ChatGPTは4千トークンしか入力できないけど、H3ベースになれば数万、数十万トークン入力可能になるかもしれないうみゆき@AI研究

Transformerが革新的な技術だったようにこれも基盤となる超重要研究になるのかもしれないwogikaze.icon

というかなれ

アーキテクチャ