未来のLLMアーキテクチャ
Large Language Model
Mamba
【Mamba入門】Transformerを凌駕しうるアーキテクチャを解説(独自の学習・推論コード含む)
https://qiita.com/peony_snow/items/649ecb307cd3b5c10aa7
RetNet論文まとめ
https://zenn.dev/tk1/articles/04c924c6f7ac8b
RWKV
https://developers.agirobots.com/jp/rwkv/
Hyena: 次世代LLMへ向けたTransformerを越える新機械学習モデル
https://recruit.gmo.jp/engineer/jisedai/blog/hyena/
【論文メモ】Hungry Hungry Hippos: Towards Language Modeling with State Space Models
https://yuiga.dev/blog/posts/hungry_hungry_hippos_towards_language_modeling_with_state_space_models/
xLSTM: Extended Long Short-Term Memory
https://arxiv.org/abs/2405.04517
PKSHAがマイクロソフトの新方式採用のLLM、従来の約3倍に高速化
https://xtech.nikkei.com/atcl/nxt/news/24/00506/
Transformers are SSMs: Generalized Models and Efficient Algorithms Through Structured State Space Duality
https://arxiv.org/abs/2405.21060
チュートリアル:Mamba, Vision Mamba (Vim)
https://speakerdeck.com/hf149/tiyutoriaru-mamba-vision-mamba-vim
Model Card for Zamba2-7B
https://huggingface.co/Zyphra/Zamba2-7B
2024 in Post-Transformers Architectures (State Space Models, RWKV) LS Live @ NeurIPS
https://www.latent.space/p/2024-post-transformers