Risk assessment and mitigation
The discovery of risks or potential failure modes - and developing and testing control measures - requires integrated work across evaluation levels. Use outcomes at one level to guide control levers or solution updates that influence others. Risk mitigation should support comprehensive, iterative detection and response, with the associated cost and intensity varying by level.
Example scenario: WhatsApp tutor chatbot
As an example, suppose we identify edtech failure modes after observing aberrant behavior at one level. The solution is a WhatsApp tutoring bot for secondary students, providing math and logic problems to solve independently at home, linked to their school curriculum. How might risks show up at each level, what mitigations would we use, and which control metrics would measure mitigation effectiveness?
Cross-level risk mitigation
Level 1
The problem complexity does not increase with each turn of the WhatsApp dialogue
Link weekly assessed learning level to problem difficulty; increase the model context window; use multi-shot prompting
Question complexity (LLM-as-a-judge using a rubric aligned to curriculum standards)
Level 2
High engagement, but concentrated on easy problems or off-topic conversations
Default to progressive difficulty; add rewards for completing challenging problems
“Time spent learning” = session length ÷ # unique problem types solved
Level 3
Users become overly dependent on the AI, reducing self-directed problem solving and help-seeking agency
Introduce delayed hints and scaffolded responses; require users to attempt a solution before seeing AI guidance; prompts that encourage reflection (“What would you try next?”)
% of problems attempted before requesting help; average number of user-initiated solution steps per problem; self-efficacy score from survey
Level 4
Learning plateaus or declines
—
# correct on standardized test; % of students exceeding threshold score
As in red teaming, you can define different risk classes to investigate (e.g., safety, privacy, security). User safety and mental health are critical concerns, and can be mitigated through activities at each level:
Level 1
Red-team GenAI models
Detect/classify harmful outputs; align models via pre-/post-processing
Level 1
Inspect model logs
Update knowledge base; apply pre-/post-processing (e.g., content filters)
Level 2
Observe product use
Adjust UI/UX to reduce friction or harm
Level 2
Analyze trace data
Add nudges/notifications; build affordances for different user segments
Level 3
Collect qualitative data (interviews, focus groups)
Surface risks, cultural fit, and harms; invite community input on mitigations
Level 3
Identify and analyze metrics that embed in conversation text
Trigger risk-reduction interventions and referrals
Level 4
Run impact evaluations
Qualitative research to explore unintended consequences
As you mitigate risk, weigh the financial and moral costs of failures across evaluation levels. A Level 1 error may be minor (extra developer time), while a Level 3 failure (e.g., loss of user trust) may require intensive in-person outreach and far higher cost. Use a routine workflow: start with risk discovery (aberrant metrics, one-off surveys, user interviews), then translate findings into new routine metrics. Three questions guide the investigation:
Why is the behavior occurring?
How could it have been discovered earlier?
What can be changed to align with the theory of change?
If product development reveals an incompatible insight, then you may need to modify the theory of change for it to maintain its guiding function.
Last updated
Was this helpful?