GLUEのQNLIでdata2vecのファインチューニングを試す

Finetuning data2vec-text on GLUE を読み解く

examples/robertaの下のREADMEを参照する

データの準備

1. Download the data from GLUE website

glue_data/QNLIに展開される

2. Preprocess GLUE task data

download bpe encoder.json, vocabulary and fairseq dictionary: wget

BPE encode: python -m examples.roberta.multiprocessing_bpe_encoder --workers 60 ...

Run fairseq preprocessing: fairseq-preprocess --workers 60 ...

QNLI-binができた（glue_dataと同じ階層）

3. Fine-tuning on GLUE task（＝data2vecのREADMEの読み解きたい箇所と思われる）

code:shell

$ python fairseq_cli/hydra_train.py -m \

--config-dir examples/roberta/config/finetuning \

--config-name qnli \

task.data=$PWD/QNLI-bin \

checkpoint.restore_file=examples/data2vec/nlp_base.pt

task.dataは絶対パスで指定

hydraの記録ディレクトリでQNLI-bin/input0/dict.txtを探そうとした

GPUが必要っぽい

AssertionError: Torch not compiled with CUDA enabled

accuracyを確認