Fine tuing
Large Language Model
CS324 Large Language Model
ローカルLLM
RLHF・DPO
Unsloth、Axolotl、Llama Factory、Transformers/PEFT などの Fine-tuning ツールの比較と特徴が議論された
https://buttondown.com/ainews/archive/ainews-not-much-happened-today-4979/
scale ai
Unslothの覚書き
https://www.nogawanogawa.com/entry/unsloth
SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training
https://arxiv.org/abs/2501.17161