transformersの文書分類の例のrun_glue.pyに手元のCSVやJSONファイルを指定したい

run_glue.pyのmain関数の中を見ている

https://github.com/huggingface/transformers/blob/v4.17.0/examples/pytorch/text-classification/run_glue.py#L241-L252

For CSV/JSON files, this script will use as labels the column called 'label' and as pair of sentences the sentences in columns called 'sentence1' and 'sentence2' if such column exists or the first two columns not named label if at least two columns are provided.

CSVやJSONの形式その1：sentence1, sentence2, label

If the CSVs/JSONs contain only one non-label column, the script does single sentence classification on this single column.

CSVやJSONの形式その2：sentence, label

実装を読んだところ、ある程度自由度があることが分かった

実装 https://github.com/huggingface/transformers/blob/v4.17.0/examples/pytorch/text-classification/run_glue.py#L261-L287

datasetsのload_datasetsを呼び出して、JSONやCSVを読み込む

https://github.com/huggingface/transformers/blob/v4.17.0/examples/pytorch/text-classification/run_glue.py#L288-L289

See more about loading any type of standard or custom dataset at https://huggingface.co/docs/datasets/loading_datasets.html .

datasets.load_datasetでJSONを読み込む

読み込んだデータ（raw_datasets）を続けて前処理し、上で引用したコメントの状態にしている

https://github.com/huggingface/transformers/blob/v4.17.0/examples/pytorch/text-classification/run_glue.py#L339-L351

列名の取得（sentence1, sentence2以外の場合にも対応するための処理）

https://github.com/huggingface/transformers/blob/v4.17.0/examples/pytorch/text-classification/run_glue.py#L394-L412

上で取得した列名を使って関数内関数を定義し、raw_datasetsを加工