Scaling Instruction-Finetuned Language Models
https://arxiv.org/abs/2210.11416
https://jmlr.org/papers/v25/23-0870.html
https://jmlr.org/papers/volume25/23-0870/23-0870.pdf
In this paper we explore instruction finetuning with a particular focus on (1) scaling the number of tasks, (2) scaling the model size, and (3) finetuning on chain-of-thought data.
PaLM, T5, U-PaLM
We also publicly release Flan-T5 checkpoints, which achieve strong few-shot performance even compared to much larger models, such as PaLM 62B
👉Flan T5
Figure 3
instruction with examples / without examples
without chain-of-thought / with chain-of-thought
Finetuned Language Models Are Zero-Shot Learnersのあとに位置づく(1. Introductionに参照があった)