Merge Large Language Models with MergeKit
今回は、 「Marcoroni-7B-v3」と「Mistral-7B-Merge-14-v0.1」の2つの異なるモデルを使用し、「SLERP」でマージします。
In this article, we introduced the concept of merging LLMs with four different methods.
SLERP
Spherical Linear Interpolation (SLERP) is a method used to smoothly interpolate between two vectors.
SLERP is currently the most popular merging method, but it is limited to combining only two models at a time.
TIES
DARE
passthrough
By concatenating layers from different LLMs, it can produce models with an exotic number of parameters (e.g., 9B with two 7B parameter models).
These models are often referred to as “frankenmerges” or “Frankenstein models” by the community.
merge_methodを指定している
ties
slerp
The parameters for the self-attention and MLP layers will use different combinations of OpenPipe/mistral-ft-optimized-1218 and mlabonne/NeuralHermes-2.5-Mistral-7B. The other layers are a 50/50 mixture of the two models.
passthrough
we will use mergekit to create our own model, Marcoro14-7B-slerp
slerp
AIDC-ai-business/Marcoroni-7B-v3 (base)
EmbeddedLLM/Mistral-7B-Merge-14-v0.1