The Technology Behind BLOOM Training
https://huggingface.co/blog/bloom-megatron-deepspeed
BF16Optimizer
Training huge LLM models in FP16 is a no-no.
ロス(交差エントロピー)の発散
https://huggingface.co/blog/assets/86_bloom_megatron_deepspeed/104b-lm-loss.png