ALiBi
https://arxiv.org/abs/2108.12409
Train Short, Test Long: Attention with Linear Biases Enables Input Length Extrapolation