A Survey of Techniques for Maximizing LLM Performance

#OpenAI_DevDay_(2023) breakout sessionの1つ

Join us for a comprehensive survey of techniques designed to unlock the full potential of Language Model Models (LLMs). Explore strategies such as fine-tuning, RAG (Retrieval-Augmented Generation), and prompt engineering to maximize LLM performance.

Colin Jarvis

ヨーロッパ担当

プロンプトエンジニアリングとRAG

John Allard

fine tuningチームリーダー

LLMを本番運用しているという話

そのための最適化（Optimizing）

最適化のone stop shotはない

Optimizing LLMs is hard

理由

Extracting signal from the noise is not easy

Performance can be abstract and difficult to measure

When to use what optimization

maximizing performance

takeaway

mental model of options

appreciation（正しい評価をする力。RAGかfine-tuneか）

まとめ（ReCap）

ref: https://youtu.be/ahnGLM-RC1Y?si=5CJnTbAYLkxGFzUa&t=2666

LLMの性能向上はプロンプトエンジニアリングから

低投資。反復する

性能向上が頭打ち

エラー分析

モデルに新しい知識を入れる必要がある: RAG

モデルが一貫性のない指示(?)に従っている、または厳密な出力構造に従う: fine-tune

03までが理論パート、04で応用（application）

導入（Maximizing LLM Performance）

Try something

evaluate

Try something else

01 Prompt engineering

最初にはいいが、特定の問題に対処するgreatな方法ではない

02 RAG

.@openAI has put in a lot of work into RAG (retrieval augmented generation). 　Accuracy has jumped from 45% to 98% with things like retaining, prompt engineering, and query expansion.

03 Fine-tuning

ベストプラクティス

プロンプトエンジニアリングとFSL（few-shot）からはじめよ

ベースラインを確立せよ

Start small, focus on quality

04 Application of theory

感想プロンプトエンジニアリングは出発点（ベースライン）にすぎないんだなあ

https://youtu.be/ahnGLM-RC1Y?si=wk6kwy_DjQk7I8MZ

GPTの開発者に最も重要なのはこのセッション。RAGとPromptingとFine tuningの役割を平易に解説している。　プロダクションレベルではレイテンシ問題に対処するためにGPT-4の出力でGPT-3.5をfine tuningする(いわゆる蒸留)のは結構使いそう。新しい3.5 turbo引くほど速いし。