# Why is this level of evaluation important?

Once an AI system is functioning as expected (Level 1), and the product is engaging users as intended (Level 2), we can ask a deeper question: Is the product influencing how users think, feel, or act—and in ways that advance a development outcome of interest?

The success of commercial AI products is often measured via user satisfaction ratings or Net Promoter Score (NPS)—essentially asking, *'Do you like this product enough to recommend it?'* But in the development sector, satisfaction is not a proxy for impact. A student might enjoy a tutoring app (high NPS) without actually mastering the curriculum. A patient may favorably review a health provider, even when harmed by sub-standard care.

In Level 3 evaluation, we identify and measure specific behaviors, beliefs, or feelings that predict long-term improvements in health, education, or livelihoods. We will use a program’s Theory of Change (TOC) to specify the “stepping stones” that users traverse on their path toward impact. Instead of waiting years to see if health or education outcomes improve, we will identify intermediate changes in how users think, feel, or act to serve as early signals of success.

To do this, organizations should address 5 key issues:

1. **Measures**: Which specific user-level changes actually matter to our Theory of Change? Can we measure these short-term changes relatively cheaply and frequently?
2. **Attribution**: Can we plausibly claim these changes are caused by our AI product?
3. **Trajectory**: Are metrics trending in the right direction? Do sub-groups behave differently?
4. **Malleability**: Can we shift metrics by altering the product experience? Do users show increased drive to act (e.g., asking proactive questions or expressing intent to change) when we intervene with product improvements?
5. **Perception**: Do users feel more empowered to act (e.g., do they have a clearer understanding of their next steps), even if they do not immediately take action? User perceptions can be predictive of outcomes, for example when a student recognizes the learning gains they have achieved while using an app.

By defining and tracking intermediate outcomes, Level 3 helps you conduct fast product iterations during pilots and ongoing feature development, setting the stage for a successful Level 4 evaluation down the road.

***

<details>

<summary>💬 Want to suggest edits or provide feedback?</summary>

{% embed url="<https://tally.so/r/A788l0?originPage=level-3-user-evaluation%2Foverview%2Fwhy-is-this-level-of-evaluation-important>" %}

</details>