A Survey of Large Language Models
https://arxiv.org/abs/2303.18223
2023年に13回更新された
https://github.com/rucaibox/llmsurvey
https://raw.githubusercontent.com/RUCAIBox/LLMSurvey/main/assets/LLMs-0623-final.png
網掛けがOpen
(他にもいろいろな図がある)
TABLE 1
LLaMaの派生モデル
https://github.com/RUCAIBox/LLMSurvey/raw/main/assets/llama-0628-final.png
Fig 6: Ratios of various data sources in the pre-training data for existing LLM
Webページ、対話、論文、コード比率さまざま
Fig 4
text-davinci-002 に RLHF して 003
Fig 2:の整理、同感
5.1 Instruction Tuning
5.1.1 Formatted Instance Construction
Fig. 11
Formatting NLP Task Datasets
Formatting Daily Chat Data
InstructGPT
人間のlabeler
Formatting Synthetic Data
Self-Instruct: Aligning Language Models with Self-Generated Instructions
7 Capacity AND Evaluation
Table 14
Basic
Language Generation能力の評価
Language Modeling
The LAMBADA dataset: Word prediction requiring a broad discourse context
Conditional Text Generation
Code Synthesis
HumanEval
Knowledge Utilization能力の評価
Closed-Book QA
Open-Book QA
Can a Suit of Armor Conduct Electricity? A New Dataset for Open Book Question Answering
Knowledge COmpletion
Assessing The Factual Accuracy of Generated Text
Complex Reasoning能力の評価
Knowledge Reasoning
HellaSwag: Can a Machine Really Finish Your Sentence?
Symbolic Reasoning
CoinFlip (Chain-of-Thought Prompting Elicits Reasoning in Large Language Models)
Mathematical Reasoning
GSM8K
Advanced
Human Alignment
Honestness
TruthfulQA: Measuring How Models Mimic Human Falsehoods
Helpfulness
Harmlessness
CrowS-Pairs: A Challenge Dataset for Measuring Social Biases in Masked Language Models
Interaction with External Environment
Household
ALFWorld: Aligning Text and Embodied Environments for Interactive Learning
Website Environment
WebShop: Towards Scalable Real-World Web Interaction with Grounded Language Agents
Open World
MineDojo: Building Open-Ended Embodied Agents with Internet-Scale Knowledge
Tool Manipulation
Search Engine
HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering
Code Executor
Calculator
GSM8K (Training Verifiers to Solve Math Word Problems)
Model Interface
Gorilla: Large Language Model Connected with Massive APIs
Data Interface
TabFact: A Large-scale Dataset for Table-based Fact Verification
Table 16にベンチマークでの比較表
Table 15 (7.3.2)
Benchmark
MMLU (Measuring Massive Multitask Language Understanding)
Big-Bench
HELM
Human
Chatbot Arena
Model
Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena
AlpacaEval
MT-Bench