Overview
Does the overall product engage and retain users?
An AI system that produces perfect responses is worthless if no one uses it. Level 2 Evaluation moves beyond technical accuracy to measure the "digital traces" users leave behind. By tracking how users move from their first interaction to long-term habit formation, we can ensure the product actually delivers value in the real world.
Key Motivation
Technical performance (Level 1) does not guarantee user adoption. Level 2 evaluation is critical because:
Value Validation: If users stop interacting, they likely see no value, meaning the intervention cannot achieve its intended life outcomes.
Continuous Improvement: It transforms product development from opinion-driven to data-driven through iterative cycles and A/B testing.
Safety & Risk Management: Monitoring user signals allows for controlled rollouts of experimental features, preventing negative reactions from reaching your entire user base at once.
Core Concept: The User Funnel
To evaluate the product, we "instrument" the application to track users as they progress through four distinct stages. We prioritize "Time to Success" (solving the user's problem) over "Time on Device" to ensure we are optimizing for welfare rather than just addiction.
Acquisition
Bring users into the ecosystem.
New User Count, Cost Per User (CAC)
Activation
Ensure users find "First Value."
Activation Rate, Time to Activate
Engagement
Measure depth and frequency of use.
Active Users (DAU/WAU), Interaction Depth
Retention
Build long-term habits/commitment.
Stickiness (DAU/MAU), Retention Rate
How to Evaluate
Level 2 evaluation is performed by integrating 3rd party analytics tools (e.g., Amplitude, Mixpanel) to capture real-time data.
Define & Instrument: Map your user journey and identify specific "events" (e.g., "audio advice played") that signal progress.
Analyze Trends: Use dashboards to identify friction points where users consistently drop off.
Experiment: Run A/B Tests to compare different versions of a feature. By randomly assigning users to "Version A" or "Version B," you can statistically prove which design better supports user goals.
Diagnose: If metrics are low, conduct a Process Evaluation (interviews or surveys) to understand the "why" behind the data—such as connectivity constraints or literacy barriers.
Last updated
Was this helpful?