Calibrate Before Use: Improving Few-shot Performance of Language Models
https://proceedings.mlr.press/v139/zhao21c.html
https://arxiv.org/abs/2102.09690
We demonstrate that this instability arises from the bias of language models towards predicting certain answers
Figure 2
There is high variance in GPT-3’s accuracy as we change the prompt’s training examples, as well as the permutation of the examples.
Language Models are Few-Shot Learnersだがバイアスがあることを述べている