# Overview

Impact evaluation is the "gold standard" of evidence. While Level 3 measures shifts in thoughts and feelings, Level 4 measures the ultimate results: improved crop yields, higher test scores, or better health outcomes. By using a counterfactual—comparing those who use your product to a similar group that does not—you can isolate the true impact of your AI intervention from the "noise" of a messy world.

***

#### Key Motivation

Policy makers, donors, and governments require credible evidence before they invest in scaling a solution. Level 4 evaluation is critical because:

* **Causal Attribution:** It proves that improvements were caused by your product, not by coincidence or external trends.
* **Informing Scale:** It provides the cost-effectiveness data needed to justify large-scale budget allocations.
* **Identifying Unintended Effects:** Rigorous trials can surface hidden negative consequences or surprising positive spillovers that simpler metrics miss.

<a href="/pages/nDEp5z31imLnvAXYixVk" class="button primary">Read more -></a>

***

#### Core Concept: The Counterfactual

To know if your AI tool works, you must estimate what would have happened to the same people *without* it. We do this by creating a comparison group.

| Method                       | How it Works                                                           | Best Used When...                                      |
| ---------------------------- | ---------------------------------------------------------------------- | ------------------------------------------------------ |
| **RCT**                      | Randomly assign users to "Treatment" or "Control."                     | You have a large sample and high control over rollout. |
| **Quasi-Experimental**       | Compare groups that follow "parallel trends" over time.                | Randomization is not feasible or ethical.              |
| **Regression Discontinuity** | Compare people just above/below a specific cutoff (e.g., test scores). | Resources are allocated based on a strict threshold.   |

<a href="/pages/tVhuFj0GRCyKNwFJGA1l" class="button primary">Read more -></a>

***

#### How to Evaluate

Level 4 is a high-investment undertaking. It should only be performed when Levels 1–3 are strong and your product is mature.

1. **Select the Right Counterfactual:** Decide what you are comparing against. Is it "Business as Usual" (no tech), a "Non-AI digital tool," or "Human-delivered services"?
2. **Manage Product Dynamism:** AI products change fast. Avoid biasing your study by tagging versions and, if possible, maintaining a holdout group on a frozen baseline version.
3. **Measure True Capabilities:** Use objective, industry-standard assessments. Ensure students aren't just "copy-pasting" AI answers; test them when they *don't* have access to the tool.
4. **Account for Spillovers:** GenAI is "leaky"—users share advice with neighbors. Use Cluster Randomization (by school or village) to prevent the control group from accidentally being "treated."
5. **Monitor Attrition:** Digital tools often have high drop-off. Use Level 2 engagement data to monitor who leaves the study and ensure it doesn't skew your final results.

**When to Start?**

Do not rush into an Impact Evaluation. You are ready for Level 4 when:

* ✅ Level 1–3 evidence is consistent.
* ✅ Scale-up is being considered by major partners.
* ✅ You have the technical bandwidth to coordinate with independent researchers.

<a href="/pages/gnarBxNy7gjgKIjcRcSH" class="button primary">Read more -></a>

***

<details>

<summary>💬 Want to suggest edits or provide feedback?</summary>

{% embed url="<https://tally.so/r/A788l0?originPage=level-4-impact-evaluation%2Foverview>" %}

</details>


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://eval.playbook.org.ai/social-impact/level-4-impact-evaluation/overview.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
