# Define guardrail metrics and measure potential harm

As you reach Level 3 evaluations, you are not just measuring if your product is working; you want to measure if it is causing harm. While Level 2 metrics track usage, Level 3 is your opportunity to use direct interviews and surveys to track unintended consequences.

A central concern is that AI models and agents can empower or disempower users. In evaluating social impact, user agency can be a critical guardrail for AI products. There is a risk that "helpful" AI agents might actually undermine development, creating dependency for users or communities rather than building capabilities. Therefore, you will want to track whether your tool is improving or reducing users’ agency.

For instance, we recommend measuring agency in two ways:

<table data-card-size="large" data-view="cards"><thead><tr><th></th><th></th><th data-hidden data-card-cover data-type="image">Cover image</th></tr></thead><tbody><tr><td><strong>Subjective Agency (Internal-Facing)</strong></td><td>This captures users’ beliefs and perceptions of their own capabilities (e.g. <a href="https://albertbandura.com/albert-bandura-agency.html">Albert Bandura’s Social Cognitive Theory</a>), measured through qualitative or survey methods. Ask users about their sense of self-efficacy: do they believe they can solve the problem on their own now?</td><td><a href="https://images.unsplash.com/photo-1515463626042-123ab67dcaa7?crop=entropy&#x26;cs=srgb&#x26;fm=jpg&#x26;ixid=M3wxOTcwMjR8MHwxfHNlYXJjaHw0fHxSZWZsZWN0aW9ufGVufDB8fHx8MTc3MzIzNzE5OXww&#x26;ixlib=rb-4.1.0&#x26;q=85">https://images.unsplash.com/photo-1515463626042-123ab67dcaa7?crop=entropy&#x26;cs=srgb&#x26;fm=jpg&#x26;ixid=M3wxOTcwMjR8MHwxfHNlYXJjaHw0fHxSZWZsZWN0aW9ufGVufDB8fHx8MTc3MzIzNzE5OXww&#x26;ixlib=rb-4.1.0&#x26;q=85</a></td></tr><tr><td><strong>Objective Agency (External-Facing)</strong></td><td>These are the capabilities required to plan, navigate, execute, and reflect on personal goals (e.g. <a href="https://www.cambridge.org/core/books/abs/amartya-sen/capability-and-agency/65BD3415B565147A740E03F42E41D047">Amartya Sen’s Capability Approach</a>). Does the user have the skills to act? Test their ability to plan and execute goals without the AI's help. Are they learning the underlying logic, or just copy-pasting answers?</td><td><a href="https://images.unsplash.com/photo-1603804449564-2ad32f24d17e?crop=entropy&#x26;cs=srgb&#x26;fm=jpg&#x26;ixid=M3wxOTcwMjR8MHwxfHNlYXJjaHwyfHxwbGFufGVufDB8fHx8MTc3MzE1MTc1NXww&#x26;ixlib=rb-4.1.0&#x26;q=85">https://images.unsplash.com/photo-1603804449564-2ad32f24d17e?crop=entropy&#x26;cs=srgb&#x26;fm=jpg&#x26;ixid=M3wxOTcwMjR8MHwxfHNlYXJjaHwyfHxwbGFufGVufDB8fHx8MTc3MzE1MTc1NXww&#x26;ixlib=rb-4.1.0&#x26;q=85</a></td></tr></tbody></table>

***

<details>

<summary>💬 Want to suggest edits or provide feedback?</summary>

{% embed url="<https://tally.so/r/A788l0?originPage=level-3-user-evaluation%2Fhow-is-level-3-evaluation-performed%2Fdefining-guardrail-metrics-measuring-potential-harm>" %}

</details>
