RLHF
Reinforcement Learning from Human Feedback