# About this playbook From math tutors to farmer advisory tools, generative AI (GenAI) is rapidly expanding across low- and middle-income countries. This playbook provides a 4-level framework and recommends practices for evaluating these GenAI tools.

### Why we need this playbook Evaluating GenAI products can mean different things depending on who you ask. Tech teams prioritize performance, often overlooking impact, while impact evaluators focus on outcomes but may neglect the underlying technology. Even within disciplines, the sophistication and quality of evaluations can differ. This playbook establishes a unified set of expectations and practices for evaluating GenAI products in global development.


	Create Shared Practices	Use consistent, credible, and comparable practices to assess what works and drive learning across the industry.
	Improve Products and Programs	Identify issues early through continuous evaluation and build better products over time.
	Demonstrate Accountability	Show stakeholders measurable progress from model performance to impact.

### Who is this playbook for

			Cover image
	Implementors and Program Managers	Improve your products and programs with credible evaluation practices.	/files/K3RO2hRVfb7kdPAuvmJ9
	Funders and Policy Makers	Make informed investments by assessing an organization’s ability to evaluate and improve their product.	/files/x4g04elKvnWNAMp5kT9g

### How to use this playbook The playbook is organized around a 4-level framework that asks the following evaluation questions:


Models Evaluation	Does the AI system perform as intended? Level 1 →	/spaces/VDHDXE8axdWQfu0OFCHP/pages/DeMcUC7YhehF7wXhEazC	Models & Behaviour
Product Evaluation	Does the overall product engage and retain users? Level 2 →	/spaces/zpcawBg21nKa217FyRsG/pages/BRhAcSDI4fzmQttWpxZl	Implementors & Program Managers
User Evaluation	Does the product change users' thoughts, feelings, knowledge and behaviour towards the development outcome? Level 3 →	/spaces/R1fawv6icuZEAPmz1pnB/pages/wcgHi9eru7seyBhXPjew	User Experience
Impact Evaluation	Do users with access to the product improve development outcomes? Level 4 →	/spaces/DNdX3hzAtddLuS4lBI4e/pages/YnZKseJWPCqdwrTYVLTE	Social Impact

Each level outlines detailed evaluation practices for organizations building AI products to pursue. The four levels form a logical progression*.* Users are unlikely to stay engaged (Level 2) if the GenAI system fails to perform (Level 1), and development outcomes are unlikely to improve (Level 4) if users disengage or their feelings, knowledge, and behaviors are harmed (Level 3). The playbook helps implementers conduct continuous evaluation across levels. Often, the results of one level of evaluation may require revisiting the performance of a preceding level. Amidst evolving technology, this enables rapid iteration, maintains expected behavior, and improves performance and impact over time. ### Setting the Foundation


	Build your team	To build a GenAI product for social impact, you need the right team that brings together development sector expertise with skillsets that are newer to the field. This section describes the relevant skillsets. Learn more →	/pages/V91mgmS1QmGOVyIVTszQ
	Build the infrastructure	Before diving into the four levels, teams should establish several key conceptual and technical building blocks that ease evaluations. This section describes what should be developed before diving in. Learn more →	/pages/tSN6S6uJ2o6t9Y6o4LYF

#### Additional Resources [FAQs](/additional-resources/frequently-asked-questions.md) | [Glossary](/additional-resources/glossary.md) | [Minimal Viable Evaluations](/additional-resources/minimum-viable-evaluations.md) | [Tools & Templates](/additional-resources/additional-resources.md) #### Stay involved * [See the process behind the playbook](/overview/the-process-behind-this-playbook.md) * [Contribute to this playbook](/overview/how-to-contribute-to-the-playbook.md)
*** This is a living playbook. It will be updated regularly, with deeper collaboration with specialists to co-create shared evaluation tools, refine methodologies, and support their practical use in real-world settings. --- # Agent Instructions: Querying This Documentation If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question. Perform an HTTP GET request on the current page URL with the `ask` query parameter: ``` GET https://eval.playbook.org.ai/overview/readme.md?ask= ``` The question should be specific, self-contained, and written in natural language. The response will contain a direct answer to the question and relevant excerpts and sources from the documentation. Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.

:handshake-angle:

:lightbulb-gear:

:clipboard-check:

:user-gear:

:dollar-sign:

:head-side-circuit:

:laptop-code:

:user-gear:

:hand-holding-seedling:

:users:

:shield-keyhole: