trl - nikkie-memos

trl

Transformer Reinforcement Learning

The library is built on top of the transformers library and thus allows to use any model architecture available there.

The SFTTrainer is a light wrapper around the transformers Trainer to easily fine-tune language models or adapters on a custom dataset.

#transformers のTrainerへ

以下は別ページへ

Trainerの inner_training_loop を呼び出して返している

functools.partialしている

ロスはどう計算されるのか？：tr_lossが参考になりそう

training_stepが呼ばれている

compute_lossはここ！

tensorを返している

How the loss is computed by Trainer. By default, all models return the loss in the first element.

outputs = model(**inputs)

generateではないんだな〜

IMO：ここCollatorの出番だったりする！？

label_smootherの返り値？