RLHF - yuyan

RLHF

Large Language Model

Deepspeed

https://github.com/microsoft/DeepSpeed/tree/master/blogs/deepspeed-chat

Direct Preference Alignment

RLHF/DPO 小話

https://akifumi-wachi-4.github.io/website/column/RLHF_DPO_column_v1.pdf