Transformer
A report that Transformer without RNN and without CNN, consisting only of attention mechanism, performs well in the translation task. We propose a new simple network architecture, the Transformer, based solely on attention mechanisms, dispensing with recurrence and convolutions entirely.
2017
---
This page is auto-translated from /nishio/Transformer. If you looks something interesting but the auto-translated English is not good enough to understand it, feel free to let me know at @nishio_en. I'm very happy to spread my thought to non-Japanese readers.