LoRA - 基素基

LoRA

fine tuningを素早くできるようにする工夫

2106.09685 LoRA: Low-Rank Adaptation of Large Language Models

2021/6/17

LoRA: Low-Rank Adaptation of Large Language Models

author Edward J. Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, Weizhu Chen

https://gyazo.com/a013cf0216b9499e50c5ef7c984ddad9

An important paradigm of natural language processing consists of large-scale pre-training on general domain data and adaptation to particular tasks or domains. As we pre-train larger models, full fine-tuning, which retrains all model parameters, becomes less feasible. Using GPT-3 175B as an example -- deploying independent instances of fine-tuned models, each with 175B parameters, is prohibitively expensive. We propose Low-Rank Adaptation, or LoRA, which freezes the pre-trained model weights and injects trainable rank decomposition matrices into each layer of the Transformer architecture, greatly reducing the number of trainable parameters for downstream tasks. Compared to GPT-3 175B fine-tuned with Adam, LoRA can reduce the number of trainable parameters by 10,000 times and the GPU memory requirement by 3 times. LoRA performs on-par or better than fine-tuning in model quality on RoBERTa, DeBERTa, GPT-2, and GPT-3, despite having fewer trainable parameters, a higher training throughput, and, unlike adapters, no additional inference latency. We also provide an empirical investigation into rank-deficiency in language model adaptation, which sheds light on the efficacy of LoRA. We release a package that facilitates the integration of LoRA with PyTorch models and provide our implementations and model checkpoints for RoBERTa, DeBERTa, and GPT-2 at this https URL.

DeepL.icon自然言語処理の重要なパラダイムは、一般的なドメインデータに対する大規模な事前学習と、特定のタスクやドメインへの適応からなる。

Probrem

大規模な事前学習を行うと、すべてのモデルパラメータを再学習する完全なfine tuningは難しくなります。

GPT-3 175Bを例にとると、175Bのパラメータを持つ微調整されたモデルの独立したインスタンスを配置することは、法外なコストがかかることがわかる

我々は、Low-Rank Adaptation（LoRA）を提案します。これは、事前に学習したモデルの重みを凍結し、学習可能なランク分解行列をTransformerアーキテクチャの各層に注入し、下流のタスクの学習パラメータ数を大きく削減するものです。

Adamでfine-tuningしたGPT-3 175Bと比較すると、LoRAは学習可能なパラメータ数を1万分の1に、GPUのメモリ使用量を3分の1に削減することができます

LoRAはRoBERTa, DeBERTa, GPT-2, GPT-3において、学習可能なパラメータ数が少ないにもかかわらず、モデルの品質は微調整と同等かそれ以上であり、高い学習スループットと、アダプタと異なり追加の推論レイテンシがないことを示しました

また、言語モデル適応におけるランク不足の実証実験を行い、LoRAの有効性を明らかにする。

LoRAとPyTorchモデルの統合を容易にするパッケージを公開し、RoBERTa, DeBERTa, GPT-2の実装とモデルのチェックポイントをこのhttpsのURLで提供しています。

画像生成AIでオリキャラを出す

/work4ai/LoRA