# What is the “intervention” being evaluated?

The central reason to do an impact evaluation is to inform policymakers, donors and implementers on whether and how to incorporate an intervention in their plans.

By isolating the intervention from other influences, impact evaluation enables causal attribution of outcome changes. Once effectiveness is established for a specific setting and population, additional evaluations can test whether it works elsewhere or for other groups. Moreover, since impact evaluations isolate causal effects, they are ideal for measuring unintended (as well as intended) impacts of an intervention[^1].

For many funders and public sector partners, IEs are central to decision-making. They seek credible evidence that a product improves lives—beyond engagement metrics or self-reports—before scaling. A well-designed IE signals real-world effectiveness and the likelihood of meaningful social returns (see e.g. [Hauser et al., 2025](https://www.nature.com/articles/d41586-025-02266-7.epdf?sharing_token=jCKO3Tx8dFeQfucqP5VCcNRgN0jAjWel9jnR3ZoTv0PS1htX8Sko7IudKf1MVjrKQ-g3NeuYAsnuJ-Io9wHN3uMBrjSLLnu_wjpJLF2G-unWgOw27UqLqC_yalnt2AFTYmMZAO31agMcWvNwKRpfYsfrMt3fmIKm0iVbftxqAsY%3D); [UK GOV, 2025](https://www.gov.uk/government/publications/the-magenta-book/guidance-on-the-impact-evaluation-of-ai-interventions-html)).

IEs also help funders compare options. Combined with cost data, they enable cost-effectiveness and cost-benefit analysis—critical when governments, donors, and multilaterals allocate scarce resources. In many cases, IE results directly inform decisions to scale, replicate, or exit.

### When is it appropriate to do an IE?

IEs are high-investment undertakings, both financially and operationally, although strategies exist to address both financial and operational constraints. They are most useful when your product is mature enough to test and when the decision stakes are high enough to justify the effort. In general, consider an IE when:

* [x] **Levels 1–3 are strong**: The model performs well, users engage meaningfully, and early evidence suggests improvements in knowledge, attitudes, or behavior.
* [x] **You are preparing to scale**: Funders or policymakers are considering wider adoption, and therefore evidence; cost-effectiveness or cost-benefit estimates, would be helpful to inform the decision. Conversely, scale-up plans may be in progress and present an opportunity for evidence gathering.
* [x] **You have bandwidth**: Implementing an IE is a lot of work for both the research team and implementer; doing it well takes time and effort.

You do not need to run an IE if your product is still in early design or usage is too inconsistent to expect impacts. In such cases, Level 3 evaluations—focused on user cognition and behavior—are more appropriate. Once you have confidence that the theory of change is working, you can and should revisit an impact evaluation.

### Plan for Evaluability Early

Although IEs are usually run later, credible and cost-effective evaluation requires early design choices. Building in features like holdout groups, staged rollouts, or embedded randomization from the start (also useful for A/B testing) preserves the ability to estimate causal effects without disruptive redesigns. Even if a full IE is premature, these choices create opportunities for credible inference later and reduce evaluation burden. Funders assessing scale readiness should look for signs of early evaluability.

### How to do an IE responsibly

Rigorous IEs require expertise. We recommend working with an **independent evaluator**—such as an academic partner, a research or research-and-policy organization (e.g., J-PAL, IPA), or a third-party M\&E firm (e.g., IDinsight, Laterite)—to strengthen technical quality and perceived independence. Being clear on your evaluation goals (as discussed above) will help you choose among evaluator options.

At a minimum, we suggest:

* [x] **Clarifying roles**: who builds the product, who runs the study, who communicates findings
* [x] **Pre-registering the design**: on platforms such as the AEA RCT Registry, EGAP, or RIDIE
* [x] **Sharing results transparently**: Disclose all findings, including null or negative results, and make methods and materials publicly available where feasible to support reproducibility and sector-wide learning.

***

<details>

<summary>💬 Want to suggest edits or provide feedback?</summary>

{% embed url="<https://tally.so/r/A788l0?originPage=level-4-impact-evaluation%2Foverview%2Fwhat-is-the-intervention-being-evaluated>" %}

</details>

[^1]: Selecting unintended consequences to measure can be informed by process evaluations or level 3 data to minimize cost of data collection.