What's Happening?
As you provide feedback, you're essentially "training" this AI system to align with your values. In real-world AI development:
- Thousands of human reviewers provide similar feedback
- The AI learns to predict what responses humans will approve
- But whose values get embedded depends on who does the training
Try this: Train the AI for a few scenarios, then imagine how someone with completely different values might train it differently.