Risk assessment and mitigation

The discovery of risks or potential failure modes - and developing and testing control measures - requires integrated work across evaluation levels. Use outcomes at one level to guide control levers or solution updates that influence others. Risk mitigation should support comprehensive, iterative detection and response, with the associated cost and intensity varying by level.

Example scenario: WhatsApp tutor chatbot

As an example, suppose we identify edtech failure modes after observing aberrant behavior at one level. The solution is a WhatsApp tutoring bot for secondary students, providing math and logic problems to solve independently at home, linked to their school curriculum. How might risks show up at each level, what mitigations would we use, and which control metrics would measure mitigation effectiveness?

Cross-level risk mitigation

Level
Risk Discovered
Control Strategies
Control Metric

Level 1

The problem complexity does not increase with each turn of the WhatsApp dialogue

Link weekly assessed learning level to problem difficulty; increase the model context window; use multi-shot prompting

Question complexity (LLM-as-a-judge using a rubric aligned to curriculum standards)

Level 2

High engagement, but concentrated on easy problems or off-topic conversations

Default to progressive difficulty; add rewards for completing challenging problems

“Time spent learning” = session length ÷ # unique problem types solved

Level 3

Users become overly dependent on the AI, reducing self-directed problem solving and help-seeking agency

Introduce delayed hints and scaffolded responses; require users to attempt a solution before seeing AI guidance; prompts that encourage reflection (“What would you try next?”)

% of problems attempted before requesting help; average number of user-initiated solution steps per problem; self-efficacy score from survey

Level 4

Learning plateaus or declines

# correct on standardized test; % of students exceeding threshold score

As in red teaming, you can define different risk classes to investigate (e.g., safety, privacy, security). User safety and mental health are critical concerns, and can be mitigated through activities at each level:

Level
Approach
Mitigation

Level 1

Red-team GenAI models

Detect/classify harmful outputs; align models via pre-/post-processing

Level 1

Inspect model logs

Update knowledge base; apply pre-/post-processing (e.g., content filters)

Level 2

Observe product use

Adjust UI/UX to reduce friction or harm

Level 2

Analyze trace data

Add nudges/notifications; build affordances for different user segments

Level 3

Collect qualitative data (interviews, focus groups)

Surface risks, cultural fit, and harms; invite community input on mitigations

Level 3

Identify and analyze metrics that embed in conversation text

Trigger risk-reduction interventions and referrals

Level 4

Run impact evaluations

Qualitative research to explore unintended consequences

As you mitigate risk, weigh the financial and moral costs of failures across evaluation levels. A Level 1 error may be minor (extra developer time), while a Level 3 failure (e.g., loss of user trust) may require intensive in-person outreach and far higher cost. Use a routine workflow: start with risk discovery (aberrant metrics, one-off surveys, user interviews), then translate findings into new routine metrics. Three questions guide the investigation:

Why is the behavior occurring?

How could it have been discovered earlier?

What can be changed to align with the theory of change?

If product development reveals an incompatible insight, then you may need to modify the theory of change for it to maintain its guiding function.


💬 Want to suggest edits or provide feedback?

Last updated

Was this helpful?