Glossary Model Training 1 min read

Reinforcement Learning from Human Feedback

Also known as: RLHF, Human Feedback Training

A training technique that uses human evaluations of AI outputs to train a reward model, which then guides the AI system to produce outputs more aligned with human preferences.

RLHF, human feedback, reward model, preference learning, constitutional AI, RLAIF, DPO, direct preference optimization, PPO, proximal policy optimization, alignment training, human preference, value learning

Sources & References

Training language models to follow instructions with human feedback

OpenAI

Research

Learning to summarize from human feedback

OpenAI

Research

Constitutional AI: Harmlessness from AI Feedback

Anthropic

Research

Related Terms

AI Alignment

The research field focused on ensuring that AI systems' goals, behaviors, and values are compatible with human intentions and societal well-being throughout their operation.

Fine-Tuning

The process of further training a pre-trained AI model on a specialized dataset to adapt its behavior, knowledge, or output style for a specific domain or task.

Large Language Model

A type of AI model trained on vast amounts of text data that can understand, generate, and manipulate human language, typically based on the transformer architecture with billions of parameters.

Machine Learning

A subset of artificial intelligence that enables systems to learn and improve from experience without being explicitly programmed, using algorithms that identify patterns in data.

Previous Reconciliation Engine

Next Replication Topology

Back to Glossary

MCP Tutorials

RAG Cookbook

Library Integrations

Context Window Engineering

Embeddings & Retrieval

Tool Use & Function Calling

Reinforcement Learning from Human Feedback

Sources & References

Related Terms

AI Alignment

Fine-Tuning

Large Language Model

Machine Learning