Connection with other levels

Ideally, as your Level 1 metrics improve and the AI system becomes more reliable, your Level 2 engagement metrics should also improve. However, technical performance does not always guarantee user adoption, so it is critical to monitor AI system metrics and product analytics in tandem, to ensure that engineering “improvements” actually translate into a better user experience.

In this playbook, Level 2 evaluation focuses exclusively on the digital traces users leave within the product. It does not include qualitative interviews or surveys that probe user beliefs and moods; these activities will fall under the domain of Level 3 evaluation. Level 3 is also where we track many of the metrics used to monitor harm (e.g., anxiety, addiction). This is why we must evaluate Levels 2 and 3 in tandem.

Note that Level 2 also ignores the external, “real world” inputs to a social program or service (like in-person trainings or customer support) that complement a digital product. Metrics to track these in-person events are typically captured in process evaluations, conducted by Monitoring and Evaluation (M&E) teams. We recommend reviewing process evaluation data alongside Level 2 product analytics, to better understand whether user frictions result from failures in the product, or in the associated offline services.

As your product evolves, remember to refine and revalidate your Level 2 metrics, to better capture the nuance of the user experience. Metrics that record meaningful interactions are more valuable than raw event counts.


💬 Want to suggest edits or provide feedback?

Last updated

Was this helpful?