Supervised Fine-tuning Trainer

https://huggingface.co/docs/trl/sft_trainer

#trl.SFTTrainer

Supervised fine-tuning (or SFT for short) is a crucial step in RLHF

example

https://github.com/huggingface/trl/blob/main/examples/scripts/sft.py （積ん読）

https://github.com/huggingface/trl/blob/main/examples/scripts/vsft_llava.py

Quickstart

code:これだけ.py

from datasets import load_dataset

from trl import SFTTrainer

dataset = load_dataset("imdb", split="train")

trainer = SFTTrainer(

"facebook/opt-350m",

train_dataset=dataset,

dataset_text_field="text",

max_seq_length=512,

)

trainer.train()

imdbデータセットを読み込み、textフィールドを指定

SFTTrainerはモデルのIDも受け取れる

中でAutoModelForCausalLM.from_pretrainedしているっぽい

Make sure to pass a correct value for max_seq_length as the default value will be set to min(tokenizer.model_max_length, 1024).

IMO：tokenizerは外から渡さなくてもいいの？

Advanced usage

Train on completions only

You can use the DataCollatorForCompletionOnlyLM to train your model on the generated prompts only.

packing=Falseとする

tokenizerを渡してdata collatorを作る

👉trl.DataCollatorForCompletionOnlyLM

To instantiate that collator for instruction data, pass a response template and the tokenizer.

https://discuss.huggingface.co/t/instruction-tuning-llm/67597/2 で知った

IMO：これがinstruction tuning?

Make sure to have a pad_token_id which is different from eos_token_id which can result in the model not properly predicting EOS (End of Sentence) tokens during generation.

Using token_ids directly for response_template

the same string (”### Assistant:”) is tokenized differently:

積ん読（trl.DataCollatorForCompletionOnlyLM側に記載してみた）

Add Special Tokens for Chat Format

Dataset format support

conversational format

messagesを持つ

roleとcontent

instruction format

promptとcompletion

If your dataset uses one of the above formats, you can directly pass it to the trainer without pre-processing.

apply_chat_template?

Format your input prompts

formatting_func引数について

関数を渡せる

引数はDataset全体っぽい（i番目を取り出しているので）

code:関数の出力フォーマット

### Question

{question}

### Answer:

{answer}

IMO：Instruction Tuning前提？

積ん読 https://github.com/huggingface/trl/pull/444#issue-1760952763

Packing dataset ( ConstantLengthDataset )

few-shotサポートと理解した

packing=True

multiple short examples are packed in the same input sequence to increase training efficiency.

（詰めるということか！）

eval_packing引数もある

Control over the pretrained model

model_init_kwargs引数

You can directly pass the kwargs of the from_pretrained() method to the SFTTrainer

Training adapters

peftサポート

Using Flash Attention and Flash Attention 2

Flash-Attention 1

optimum

Flash Attention-2

flash-attn

Best practices

SFTTrainer always pads by default the sequences to the max_seq_length argument of the SFTTrainer.

For training adapters in 8bit, you might need to tweak the arguments of the prepare_model_for_kbit_training method from PEFT, hence we advise users to use prepare_in_int8_kwargs field, or create the PeftModel outside the SFTTrainer and pass it.

For a more memory-efficient training using adapters, you can load the base model in 8bit, for that simply add load_in_8bit argument when creating the SFTTrainer, or create a base model in 8bit outside the trainer and pass it.

If you create a model outside the trainer, make sure to not pass to the trainer any additional keyword arguments that are relative to from_pretrained() method.

Datasets

In the SFTTrainer we smartly support datasets.IterableDataset in addition to other style datasets.

This is useful if you are using large corpora that you do not want to save all to disk.

The data will be tokenized and processed on the fly, even when packing is enabled.

in the SFTTrainer, we support pre-tokenized datasets if they are datasets.Dataset or datasets.IterableDataset.

In other words, if such a dataset has a column of input_ids, no further processing (tokenization or packing) will be done, and the dataset will be used as-is.

「データセットがinput_idsというカラムを持つ場合、追加の処理（トークン化やパッキング）は行われず、データセットは現在の状態で使われる」

This can be useful if you have pretokenized your dataset outside of this script and want to re-use it directly.

「スクリプトの外側で事前にトークン化している場合に便利」