Merge Large Language Models with MergeKit
https://mlabonne.github.io/blog/posts/2024-01-08_Merge_LLMs_with_mergekit%20copy.html
Hugging Face版 https://huggingface.co/blog/mlabonne/merge-models
npakaさん mergekit を使用してLLMをマージする
今回は、 「Marcoroni-7B-v3」と「Mistral-7B-Merge-14-v0.1」の2つの異なるモデルを使用し、「SLERP」でマージします。
#mlabonne/llm-course #arcee-ai/mergekit
In this article, we introduced the concept of merging LLMs with four different methods.
SLERP
Spherical Linear Interpolation (SLERP) is a method used to smoothly interpolate between two vectors.
SLERP is currently the most popular merging method, but it is limited to combining only two models at a time.
TIES
TIES-Merging: Resolving Interference When Merging Models
DARE
Language Models are Super Mario: Absorbing Abilities from Homologous Models as a Free Lunch
passthrough
By concatenating layers from different LLMs, it can produce models with an exotic number of parameters (e.g., 9B with two 7B parameter models).
These models are often referred to as “frankenmerges” or “Frankenstein models” by the community.
SOLAR 10.7B: Scaling Large Language Models with Simple yet Effective Depth Up-Scaling
ソースコード https://colab.research.google.com/drive/1_JS7JKJAQozD48-LhYdegcuuZ2ddgXfr?usp=sharing
merge_methodを指定している
ties
slerp
The parameters for the self-attention and MLP layers will use different combinations of OpenPipe/mistral-ft-optimized-1218 and mlabonne/NeuralHermes-2.5-Mistral-7B. The other layers are a 50/50 mixture of the two models.
https://huggingface.co/mlabonne/NeuralPipe-7B-slerp
passthrough
we will use mergekit to create our own model, Marcoro14-7B-slerp
slerp
AIDC-ai-business/Marcoroni-7B-v3 (base)
https://huggingface.co/AIDC-ai-business/Marcoroni-7B-v3 (404のために再現させられない)
EmbeddedLLM/Mistral-7B-Merge-14-v0.1
https://huggingface.co/EmbeddedLLM/Mistral-7B-Merge-14-v0.1
collection https://huggingface.co/collections/osanseviero/model-merging-65097893623330a3a51ead66 (積ん読)