LoRA: Low-Rank Adaptation of Large Language Models

We propose Low-Rank Adaptation, or LoRA, which freezes the pre-trained model weights and injects trainable rank decomposition matrices into each layer of the Transformer architecture, greatly reducing the number of trainable parameters for downstream tasks. (Abstractより)

（論文要約スライドを見かけた覚えあり）

https://github.com/microsoft/LoRA

Figure 1

増分重みを2つの低ランク行列A,Bの積とする

元の重みが d * d

低ランク行列2つを導入 (2dr < d**2)

A: d * r

B: r * d

Figure 2

Table 5 (7.1)

rankは小さくともLoRAを適用する層の種類を増やしたほうがよいとのこと

Table 6 (7.2)

rankは2から8

得られた重みを元の重みに予め足しておけば、推論時にはオーバーヘッドが生じない（TODO 詳しく理解）

merge_and_unload??