The Alignment Game

Train an AI by providing feedback on its ethical responses.

You'll see how your values gradually shape the AI's behavior through a process called Reinforcement Learning from Human Feedback (RLHF). Watch how the AI's responses evolve based on what you reward and what you correct.

Current Scenario

Ethical Dilemma

AI's Current Response

Your Training

Provide Feedback:

Suggest Better Response (optional)

Training Status

Training History & Value Drift

What's Happening?

As you provide feedback, you're essentially "training" this AI system to align with your values. In real-world AI development:

Thousands of human reviewers provide similar feedback
The AI learns to predict what responses humans will approve
But whose values get embedded depends on who does the training

Try this: Train the AI for a few scenarios, then imagine how someone with completely different values might train it differently.