LLaMA - work4ai

LLaMA

https://www.itmedia.co.jp/news/articles/2302/25/news045.html

LLaMAは、パラメータの数の異なる4つのサイズ（7B、13B、33B、65B）で提供する。

論文によると、13BサイズでもLLaMAはほとんどのベンチマークでGPT-3よりも性能が上という。65Bであれば、米Google系列のDeepMindのChinchilla70BやGoogleのPaLM 540Bに匹敵するとしている。

https://ai.facebook.com/blog/large-language-model-llama-meta-ai/

(Large Language Model Meta AI), a state-of-the-art foundational large language model designed to help researchers advance their work in this subfield of AI.

AI研究者の文献漁り向け？基素.icon

Smaller, more performant models such as LLaMA enable others in the research community who don’t have access to large amounts of infrastructure to study these models, further democratizing access in this important, fast-changing field

ご家庭のPCでも動かせそうな書き方基素.icon

https://www.marktechpost.com/2023/02/25/meta-ai-unveils-llama-a-series-of-open-source-language-models-ranging-from-7b-to-65b-parameters/

英語版CCNet、C4、GitHub、Wikipedia、Books、ArXiv、Stack Exchangeなどが、LLaMAの学習に使われたデータソースにあたります。

オープンソース(非商用ライセンス)にするためにデータの拾い先を絞っている

table:llama

model VRAM GPU

3B 10GB 3060/3080/3090

13B 20GB 3090ti/4090

33B 30GB GV100

65B 40GB A6000/A100

識者解説

逆瀬川(@gyakuse)

それと、たぶん日本語はあまり使えない(C4からは非英語データが除外されていて、採用されたwikipediaデータセットの20の言語に入っていない)。ただMETA AI(FAIR)の論文の例に漏れず知見が細かく書かれているので最高。

逆瀬川(@gyakuse)

decoder-onlyのTransformerアーキテクチャであり、PaLMのようにReLUをSwiGLUに置き換えている。またGPT-NeoXのように位置埋め込みにRoPEを使っている。学習の最適化にxformersを用いていてmemory_efficient_attentionは画像生成界隈にとっては馴染み深い

https://toukei-lab.com/llamaMeta開発のLLMであるLLaMA、LLaMA2、派生モデルAlpacaについて解説！｜スタビジ

Googleフォームから申請 or Torrentでダウンロードできる

https://docs.google.com/forms/d/e/1FAIpQLSfqNECQnMkycAp2jP4Z9TFX0cGR4uf7b_fBxjY_OjhJILlKGA/viewform

論文か書籍をpublishしてないと応募する権利がなさそう基素.icon

magnet:?xt=urn:btih:b8287ebfa04f879b048d4d4404108cf3e8014352&dn=LLaMA

4chan

torrentはMotrixを使うのがよい

llama-dl

https://github.com/facebookresearch/llama/pull/73/files

magnet:?xt=urn:btih:ZXXDAUWYLRUXXBHUYEMS6Q5CE5WA3LVA&dn=LLaMA

LLaMa 2

llama-32k

✖Large Language Model Meta AI

#LLM

#Meta