注意機構 - 西尾泰和の外部脳

注意機構

注意(Attention)

2018年現在の一般化

$ \mathrm{Attention}(query, Keys, Values) = \mathrm{Normalize}(F(query, Keys)) \cdot Values

queryと複数のkeyの束であるKeysがある

queryとKeysを引数にとってそれぞれのkeyに対する注意の強さを返す関数Fがある

その結果を何らかの方法で合計が1になるように正規化して注意強度を得る(だいたいsoftmaxだが see ハード注意機構)

その注意強度でValuesを重み付け平均する

図解

https://gyazo.com/211618e709ff284a379c5c2f502934da

FはKeyの個数を知らない。$ F(query, Key)はKeyのshapeに依存しない。

数学語でどう表現するのが良いかわからない。

一つのqueryと一つのkeyを受け取る関数fがあって[f(query, key) for key in Keys]

2014年加法注意 1409.0473 Neural Machine Translation by Jointly Learning to Align and Translate

Func := Feed-Forward Network

$ Attention(query, Key, Value) = Softmax(FFN(concat(query, Key))) \cdot Value

By letting the decoder have an attention mechanism, we relieve the encoder from the burden of having to encode all information in the source sentence into a fixedlength vector. With this new approach the information can be spread throughout the sequence of annotations, which can be selectively retrieved by the decoder accordingly.

RNNの隠れ状態は固定長のベクトルで、文章全体のデータをそこに詰め込んで覚えておくのは負担

注意機構は任意長のデータから情報を取り出すことができるのでその負担を軽減する

https://gyazo.com/dab69f04c581681e9c3c543b92633ef5

2015年内積注意 1508.04025 Effective Approaches to Attention-based Neural Machine Translation

queryとkeyを単に内積したもので良いという割り切り

$ Attention(query, Key, Value) = Softmax(query \cdot Key) \cdot Value

もちろんこの内積は論文によっては行列積で表現されたりしている