
About this playbook
From math tutors to farmer advisory tools, generative AI (GenAI) is rapidly expanding across low- and middle-income countries. This playbook provides a 4-level framework and recommends practices for evaluating these GenAI tools.
Why we need this playbook
Evaluating GenAI products can mean different things depending on who you ask. Tech teams prioritize performance, often overlooking impact, while impact evaluators focus on outcomes but may neglect the underlying technology. Even within disciplines, the sophistication and quality of evaluations can differ.
This playbook establishes a unified set of expectations and practices for evaluating GenAI products in global development.
Who is this playbook for
How to use this playbook
The playbook is organized around a 4-level framework that asks the following evaluation questions:
User Evaluation
Does the product change users' thoughts, feelings, knowledge and behaviour towards the development outcome? Level 3 →
Each level outlines detailed evaluation practices for organizations building AI products to pursue.
The four levels form a logical progression. Users are unlikely to stay engaged (Level 2) if the GenAI system fails to perform (Level 1), and development outcomes are unlikely to improve (Level 4) if users disengage or their feelings, knowledge, and behaviors are harmed (Level 3).
The playbook helps implementers conduct continuous evaluation across levels. Often, the results of one level of evaluation may require revisiting the performance of a preceding level. Amidst evolving technology, this enables rapid iteration, maintains expected behavior, and improves performance and impact over time.
Setting the Foundation
Build your team
To build a GenAI product for social impact, you need the right team that brings together development sector expertise with skillsets that are newer to the field. This section describes the relevant skillsets. Learn more →
Build the infrastructure
Before diving into the four levels, teams should establish several key conceptual and technical building blocks that ease evaluations. This section describes what should be developed before diving in. Learn more →
Additional Resources
FAQs | Glossary | Minimal Viable Evaluations | Tools & Templates
Stay involved
This is a living playbook. It will be updated regularly, with deeper collaboration with specialists to co-create shared evaluation tools, refine methodologies, and support their practical use in real-world settings.
Last updated
Was this helpful?

