transformersの文書分類の例のrun_glue.pyにlivedoorニュースコーパスを渡す
結果、再現した(と言えるはず)🙌
accuracyは3%低いが、データの分け方が違うためと考えている
code:shell
$ pwd
/.../transformers/examples/pytorch/text-classification
$ # 『BERTによる自然言語処理入門』を参考にlivedoorニュースコーパスを取得
$ python preprocess.py text livedoor_news_corpus
code:ipynb
!wc -l transformers/examples/pytorch/text-classification/livedoor_news_corpus/*.json
1474 transformers/examples/pytorch/text-classification/livedoor_news_corpus/test.json
4420 transformers/examples/pytorch/text-classification/livedoor_news_corpus/train.json
1473 transformers/examples/pytorch/text-classification/livedoor_news_corpus/val.json
7367 total
code:train.ipynb
!cd transformers/examples/pytorch/text-classification/ && python run_glue.py \
--model_name_or_path cl-tohoku/bert-base-japanese-whole-word-masking \
--train_file livedoor_news_corpus/train.json \
--validation_file livedoor_news_corpus/val.json \
--test_file livedoor_news_corpus/test.json \
--do_train \
--do_eval \
--do_predict \
--data_seed 42 \
--max_seq_length 128 \
--per_device_train_batch_size 32 \
--learning_rate 1e-5 \
--num_train_epochs 5 \
--output_dir /tmp/livedoor/
learning_rateとnum_train_epochsは第6章 文章分類を参照した 20分未満の訓練の末
code:/tmp/livedoor/README.md
It achieves the following results on the evaluation set:
- Loss: 0.4653
- Accuracy: 0.8513