Merge Large Language Models with MergeKit

https://mlabonne.github.io/blog/posts/2024-01-08_Merge_LLMs_with_mergekit%20copy.html

Hugging Face版 https://huggingface.co/blog/mlabonne/merge-models

npakaさん mergekit を使用してLLMをマージする

今回は、「Marcoroni-7B-v3」と「Mistral-7B-Merge-14-v0.1」の2つの異なるモデルを使用し、「SLERP」でマージします。

#mlabonne/llm-course #arcee-ai/mergekit

In this article, we introduced the concept of merging LLMs with four different methods.

SLERP

Spherical Linear Interpolation (SLERP) is a method used to smoothly interpolate between two vectors.

SLERP is currently the most popular merging method, but it is limited to combining only two models at a time.

TIES

TIES-Merging: Resolving Interference When Merging Models

DARE

Language Models are Super Mario: Absorbing Abilities from Homologous Models as a Free Lunch

passthrough

By concatenating layers from different LLMs, it can produce models with an exotic number of parameters (e.g., 9B with two 7B parameter models).

These models are often referred to as “frankenmerges” or “Frankenstein models” by the community.

SOLAR 10.7B: Scaling Large Language Models with Simple yet Effective Depth Up-Scaling

ソースコード https://colab.research.google.com/drive/1_JS7JKJAQozD48-LhYdegcuuZ2ddgXfr?usp=sharing

merge_methodを指定している

ties

slerp

The parameters for the self-attention and MLP layers will use different combinations of OpenPipe/mistral-ft-optimized-1218 and mlabonne/NeuralHermes-2.5-Mistral-7B. The other layers are a 50/50 mixture of the two models.

https://huggingface.co/mlabonne/NeuralPipe-7B-slerp

passthrough

we will use mergekit to create our own model, Marcoro14-7B-slerp

slerp

AIDC-ai-business/Marcoroni-7B-v3 (base)

https://huggingface.co/AIDC-ai-business/Marcoroni-7B-v3 (404のために再現させられない)

EmbeddedLLM/Mistral-7B-Merge-14-v0.1

https://huggingface.co/EmbeddedLLM/Mistral-7B-Merge-14-v0.1

collection https://huggingface.co/collections/osanseviero/model-merging-65097893623330a3a51ead66 （積ん読）