ELECTRA - 🍊miyamonz🍊

ELECTRA

https://arxiv.org/abs/2003.10555

Submitted on 23 Mar 2020

https://ai.googleblog.com/2020/03/more-efficient-nlp-model-pre-training.html

https://gyazo.com/2a998c00e4f8c4f89a9b76e8fed4409a

The x-axis shows the amount of compute used to train the model (measured in FLOPs) and the y-axis shows the dev GLUE score. ELECTRA learns much more efficiently than existing pre-trained NLP models. Note that current best models on GLUE such as T5 (11B) do not fit on this plot because they use much more compute than others (around 10x more than RoBERTa).

https://github.com/google-research/electra

Efficiently Learning an Encoder that Classifies Token Replacements Accurately

トークンの置き換えを正確に分類してエンコーダーを効率的に学習

google ai blogの意訳

https://webbigdata.jp/ai/post-5215

既存のNLP用の事前トレーニング手法には言語モデルとマスク言語モデルの２種類がある

マスク言語モデル(Masked LM)

双方向なので言語モデルより優位だが入力文の全てを学習に利用できない欠点がある

注意miyamonz.icon

ここで言う言語モデルというのは、左から右の単方向で次単語を予測するヤツのことを言っている　はず

ELECTRAはRTDと呼ばれる新しい手法で両者の良いところを取り入れ少ないデータで効率的な学習が可能

例えば、ELECTRAは、従来の1/4未満の計算量で、GLUE自然言語理解ベンチマークでRoBERTaおよびXLNetのパフォーマンスに匹敵し、SQuAD質問回答ベンチマークで最先端の結果を達成します。