# About this playbook

From math tutors to farmer advisory tools, generative AI (GenAI) is rapidly expanding across low- and middle-income countries. This playbook provides a 4-level framework and recommends practices for evaluating these GenAI tools.

<br>

<figure><img src="/files/NxwZtqXAhA3XAfAWsqoO" alt=""><figcaption></figcaption></figure>

### Why we need this playbook

Evaluating GenAI products can mean different things depending on who you ask. Tech teams prioritize performance, often overlooking impact, while impact evaluators focus on outcomes but may neglect the underlying technology. Even within disciplines, the sophistication and quality of evaluations can differ.

This playbook establishes a unified set of expectations and practices for evaluating GenAI products in global development.

<table data-view="cards"><thead><tr><th></th><th></th><th></th><th data-hidden data-card-cover data-type="files"></th></tr></thead><tbody><tr><td><h4><i class="fa-handshake-angle">:handshake-angle:</i></h4></td><td><strong>Create Shared Practices</strong></td><td>Use consistent, credible, and comparable practices to assess what works and drive learning across the industry.</td><td></td></tr><tr><td><h4><i class="fa-lightbulb-gear">:lightbulb-gear:</i></h4></td><td><strong>Improve Products and Programs</strong></td><td>Identify issues early through continuous evaluation and build better products over time.</td><td></td></tr><tr><td><h4><i class="fa-clipboard-check">:clipboard-check:</i></h4></td><td><strong>Demonstrate Accountability</strong></td><td>Show stakeholders measurable progress from model performance to impact.</td><td></td></tr></tbody></table>

### Who is this playbook for

<table data-card-size="large" data-view="cards"><thead><tr><th></th><th></th><th></th><th data-hidden data-card-cover data-type="image">Cover image</th></tr></thead><tbody><tr><td><h4><i class="fa-user-gear">:user-gear:</i></h4></td><td><strong>Implementors and Program Managers</strong></td><td>Improve your products and programs with credible evaluation practices.</td><td><a href="/files/K3RO2hRVfb7kdPAuvmJ9">/files/K3RO2hRVfb7kdPAuvmJ9</a></td></tr><tr><td><h4><i class="fa-dollar-sign">:dollar-sign:</i></h4></td><td><strong>Funders and Policy Makers</strong></td><td>Make informed investments by assessing an organization’s ability to evaluate and improve their product.</td><td><a href="/files/x4g04elKvnWNAMp5kT9g">/files/x4g04elKvnWNAMp5kT9g</a></td></tr></tbody></table>

### How to use this playbook

The playbook is organized around a 4-level framework that asks the following evaluation questions:

<table data-card-size="large" data-view="cards"><thead><tr><th></th><th></th><th></th><th data-hidden data-type="content-ref"></th><th data-hidden></th><th data-hidden data-card-cover data-type="files"></th></tr></thead><tbody><tr><td><h4><i class="fa-head-side-circuit">:head-side-circuit:</i></h4></td><td><strong>Models Evaluation</strong></td><td>Does the AI system perform as intended?<br><br><a href="/spaces/VDHDXE8axdWQfu0OFCHP/pages/DeMcUC7YhehF7wXhEazC">Level 1 →</a></td><td><a href="/spaces/VDHDXE8axdWQfu0OFCHP/pages/DeMcUC7YhehF7wXhEazC">/spaces/VDHDXE8axdWQfu0OFCHP/pages/DeMcUC7YhehF7wXhEazC</a></td><td><a href="#how-the-framework-works">Models &#x26; Behaviour</a></td><td></td></tr><tr><td><h4><i class="fa-laptop-code">:laptop-code:</i></h4></td><td><strong>Product Evaluation</strong></td><td>Does the overall product engage and retain users?<br><br><a href="/spaces/zpcawBg21nKa217FyRsG/pages/BRhAcSDI4fzmQttWpxZl">Level 2 →</a></td><td><a href="/spaces/zpcawBg21nKa217FyRsG/pages/BRhAcSDI4fzmQttWpxZl">/spaces/zpcawBg21nKa217FyRsG/pages/BRhAcSDI4fzmQttWpxZl</a></td><td><a href="#how-the-framework-works">Implementors &#x26; Program Managers</a></td><td></td></tr><tr><td><h4><i class="fa-user-gear">:user-gear:</i></h4></td><td><strong>User Evaluation</strong></td><td>Does the product change users' thoughts, feelings, knowledge and behaviour towards the development outcome?<br><br><a href="/spaces/R1fawv6icuZEAPmz1pnB/pages/wcgHi9eru7seyBhXPjew">Level 3 →</a></td><td><a href="/spaces/R1fawv6icuZEAPmz1pnB/pages/wcgHi9eru7seyBhXPjew">/spaces/R1fawv6icuZEAPmz1pnB/pages/wcgHi9eru7seyBhXPjew</a></td><td><a href="#how-the-framework-works">User Experience</a></td><td></td></tr><tr><td><h4><i class="fa-hand-holding-seedling">:hand-holding-seedling:</i></h4></td><td><strong>Impact Evaluation</strong></td><td>Do users with access to the product improve development outcomes?<br><br><a href="/spaces/DNdX3hzAtddLuS4lBI4e/pages/YnZKseJWPCqdwrTYVLTE">Level 4 →</a></td><td><a href="/spaces/DNdX3hzAtddLuS4lBI4e/pages/YnZKseJWPCqdwrTYVLTE">/spaces/DNdX3hzAtddLuS4lBI4e/pages/YnZKseJWPCqdwrTYVLTE</a></td><td><a href="#how-the-framework-works">Social Impact</a></td><td></td></tr></tbody></table>

Each level outlines detailed evaluation practices for organizations building AI products to pursue.

The four levels form a logical progressio&#x6E;*.* Users are unlikely to stay engaged (Level 2) if the GenAI system fails to perform (Level 1), and development outcomes are unlikely to improve (Level 4) if users disengage or their feelings, knowledge, and behaviors are harmed (Level 3).

The playbook helps implementers conduct continuous evaluation across levels. Often, the results of one level of evaluation may require revisiting the performance of a preceding level. Amidst evolving technology, this enables rapid iteration, maintains expected behavior, and improves performance and impact over time.

### Setting the Foundation

<table data-card-size="large" data-view="cards"><thead><tr><th></th><th></th><th></th><th data-hidden data-type="content-ref"></th><th data-hidden data-card-cover data-type="files"></th></tr></thead><tbody><tr><td><h4><i class="fa-users">:users:</i></h4></td><td><strong>Build your team</strong></td><td>To build a GenAI product for social impact, you need the right team that brings together development sector expertise with skillsets that are newer to the field. This section describes the relevant skillsets.<br><br><a href="/pages/V91mgmS1QmGOVyIVTszQ">Learn more →</a></td><td><a href="/pages/V91mgmS1QmGOVyIVTszQ">/pages/V91mgmS1QmGOVyIVTszQ</a></td><td></td></tr><tr><td><h4><i class="fa-shield-keyhole">:shield-keyhole:</i></h4></td><td><strong>Build the infrastructure</strong></td><td>Before diving into the four levels, teams should establish several key conceptual and technical building blocks that ease evaluations. This section describes what should be developed before diving in.<br><br><a href="/pages/tSN6S6uJ2o6t9Y6o4LYF">Learn more →</a></td><td><a href="/pages/tSN6S6uJ2o6t9Y6o4LYF">/pages/tSN6S6uJ2o6t9Y6o4LYF</a></td><td></td></tr></tbody></table>

#### Additional Resources

[FAQs](/additional-resources/frequently-asked-questions.md) | [Glossary](/additional-resources/glossary.md) | [Minimal Viable Evaluations](/additional-resources/minimum-viable-evaluations.md) | [Tools & Templates](/additional-resources/additional-resources.md)

#### Stay involved

* [See the process behind the playbook](/overview/the-process-behind-this-playbook.md)
* [Contribute to this playbook](/overview/how-to-contribute-to-the-playbook.md)<br>

***

This is a living playbook. It will be updated regularly, with deeper collaboration with specialists to co-create shared evaluation tools, refine methodologies, and support their practical use in real-world settings.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://eval.playbook.org.ai/overview/readme-1.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
