Transformer
https://isobe324649.hatenablog.com/entry/2023/03/20/215000
https://gyazo.com/76f4f83f061c152b478c35ea07cd4cf7
左 encoder
翻訳元の文章を理解するのが得意な構造
のちにBERTで活用された
https://github.com/facebookresearch/xformers#transformers-key-concepts
You'll find the key repository boundaries in this illustration: a Transformer is generally made of a collection of Attention mechanism, embeddings to encode some positional information, feed-forward blocks and a residual path (typically referred to as pre- or post- layer norm).
June 08, 2022 Transformerの最前線 〜 畳込みニューラルネットワークの先へ 〜 - Speaker Deck
via https://twitter.com/biomedicalhacks/status/1542636599502024705
https://arxiv.org/abs/2207.09238
Transformerは本文がわかりづらい
わかるように全部書いてやるという論文
MLPは前提
https://overcast.fm/+MhOrrh3D8
アーキテクチャの基本形は3つのパターン
decoder / encoder / trasnformer
encder / transformer
decoder / transfomer
GPTはこれ
decoderのような単語は歴史的経緯で、別にdecodeしてない
https://www.youtube.com/watch?v=50XvMaWhiTY&list=PLhDAH9aTfnxKXf__soUoAEOrbLAOnVHCP&index=29
https://www.youtube.com/watch?v=FFoLqib6u-0&list=PLhDAH9aTfnxKXf__soUoAEOrbLAOnVHCP&index=30
https://www.youtube.com/watch?v=n1QYofU3_hY&list=PLhDAH9aTfnxKXf__soUoAEOrbLAOnVHCP&index=40
Scalling law
Googleが特許をとっている
US10452978B2 - Attention-based sequence transduction neural networks - Google Patents
https://twitter.com/vintersn0w/status/1598543121012641792
https://twitter.com/h_okumura/status/1598542406973988864