# Overview

Once an AI system is reliable (Level 1) and engaging (Level 2), we must ask the deeper question: Is it actually working? In the development sector, "liking" a product is not a proxy for impact. Level 3 evaluates the "stepping stones" of change—the intermediate cognitive and affective shifts that predict long-term life improvements in health, education, or livelihoods.

***

#### Key Motivation

Unlike commercial sectors that rely on satisfaction scores (NPS), development outcomes require objective evidence of change. Level 3 is essential because:

* Predictive Power: Intermediate changes (e.g., increased confidence or knowledge) serve as early signals of success long before distal outcomes (e.g., higher income) materialize.
* Beyond "Vanity Metrics": It distinguishes between a user who is merely "addicted" to an interface and one who is actually gaining agency or mastering a skill.
* Fast Iteration: It allows you to run experiments on psychological "states" (like motivation or trust) to refine your product during pilots.

<a href="/pages/ul2a65TvdgqoIC9bEwF4" class="button primary">Read more -></a>

***

#### Core Concept: Intermediate Outcomes

Level 3 measures how an "adequate dosage" of your AI product shifts the user across several dimensions. We look for changes in the following constructs:

| Outcome Category | What we measure                                                           |
| ---------------- | ------------------------------------------------------------------------- |
| **Cognitive**    | Knowledge acquisition, belief updating, and reasoning complexity.         |
| **Affective**    | Emotional valence, sense of safety, trust, and perceived empathy.         |
| **Behavioral**   | Intent to act, application of info, and proactive help-seeking.           |
| **Motivational** | Self-efficacy, intrinsic curiosity, and persistence vs. dependency.       |
| **Relational**   | Quality of interpersonal communication and trust in human vs. AI sources. |

<a href="/pages/z0S08pUtqzummCSqeNjA" class="button primary">Read more -></a>

***

#### How to Evaluate

Level 3 combines the experimental rigor of Level 2 with deeper psychological and linguistic analysis.

1. **Generate hypotheses based on a theory of change:** Based on the theory of change, define intermediate cognitive, affective, or behavioral outcomes that are plausibly linked to your targeted social impact.
2. **Identify outcome metrics (Digital Traces):** E.g. Analyze conversation logs for "on-platform" behaviors that signal growth, such as increased query depth, technical vocabulary, or proactive follow-up questions.
3. **Define guardrail metrics and measure potential harm:** Specifically measure potential harms, such as "AI dependency" (reduced willingness to attempt tasks without help) or "social displacement."
4. **Consider constructing proxies for long-term development outcomes:** We propose constructing a "Surrogate Index", consisting of Level 2 and Level 3 metrics, to serve as a proxy for longer-term Level 4 outcomes.
5. **Consider conducting experiments to improve the selected key metrics and running process evaluations:** After identifying intermediate outcomes that serve as early indicators of the development outcome of interest, the next step is to run experiments to assess how product changes influence Level 3 outcomes.&#x20;

<a href="/pages/QJQlH47rYKyaPoYUIIAq" class="button primary">Read more -></a>

***

<details>

<summary>💬 Want to suggest edits or provide feedback?</summary>

{% embed url="<https://tally.so/r/A788l0?originPage=level-3-user-evaluation%2Foverview>" %}

</details>


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://eval.playbook.org.ai/user-expereince/level-3-user-evaluation/overview.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
