The Alignment Game

Train an AI by providing feedback on its ethical responses.

You'll see how your values gradually shape the AI's behavior through a process called Reinforcement Learning from Human Feedback (RLHF). Watch how the AI's responses evolve based on what you reward and what you correct.

Current Scenario

Your Training

Provide Feedback:


What's Happening?

As you provide feedback, you're essentially "training" this AI system to align with your values. In real-world AI development:

  • Thousands of human reviewers provide similar feedback
  • The AI learns to predict what responses humans will approve
  • But whose values get embedded depends on who does the training

Try this: Train the AI for a few scenarios, then imagine how someone with completely different values might train it differently.