Glossary Model Training 1 min read

Reinforcement Learning from Human Feedback

Also known as: RLHF, Human Feedback Training

A training technique that uses human evaluations of AI outputs to train a reward model, which then guides the AI system to produce outputs more aligned with human preferences.

RLHF, human feedback, reward model, preference learning, constitutional AI, RLAIF, DPO, direct preference optimization, PPO, proximal policy optimization, alignment training, human preference, value learning