# Why is this level of evaluation important? An AI system that produces perfect responses is worthless if users do not use it. Once you deploy your AI system as a product (e.g., a chatbot or app), you must track a few critical user signals, like: * Engagement: How many users are using the product? * Retention: How likely are they to continue using it? If users never engage—or stop interacting because they see no value—they are unlikely to change their behavior in ways that improve their life outcomes. Like AI system evals, Level 2 evaluation is a continuous, iterative cycle, not a one-time exercise. We track user interaction metrics over time, and look for unexpected drop-off or intended improvements, for example when a promising new feature is released as part of an A/B test. Product evaluations are critical for iterative improvement, but they can also be a matter of safety. Suppose you have an experimental new feature in your chatbot – but you’re not sure how people will react. It might be risky to roll out this new feature to all users, all at once. ***

💬 Want to suggest edits or provide feedback?

{% embed url="" %}