Transformer - NISHIO Hirokazu's Scrapbox (Auto-translated from Japanese)

Transformer

A report that Transformer without RNN and without CNN, consisting only of attention mechanism, performs well in the translation task.

We propose a new simple network architecture, the Transformer, based solely on attention mechanisms, dispensing with recurrence and convolutions entirely.

Attention Is All You Need Łukasz Kaiser et al., arXiv, 2017/06

2017

https://arxiv.org/pdf/1706.03762.pdf

Commentary 2017-12 http://deeplearning.hatenablog.com/entry/transformer

attention mechanism

Attention mechanism is a dictionary object

view of addition and internal volume caution

Source Target Attention and self-caution

reduced inner product note (Scaled Dot-Product Attention)

---

This page is auto-translated from /nishio/Transformer using DeepL. If you looks something interesting but the auto-translated English is not good enough to understand it, feel free to let me know at @nishio_en. I'm very happy to spread my thought to non-Japanese readers.