# Who is the “User” being evaluated?

At Level 3, we will examine whether the engaged user (i.e., a user receiving an “adequate” dosage) is thinking, feeling, or acting differently as a result of the product – hopefully in ways that predict improved life outcomes. This level of evaluation typically occurs in advance of an impact assessment. Before committing to a rigorous and time-intensive impact study, we want to observe users changing along some of the following dimensions, based on the theory of change of the product:

* **Cognitive outcomes:** Are users learning? Are they gaining new knowledge or updating beliefs? Do they demonstrate improved skills or decision-making ability as a result of engaging with the product?
  * *Constructs to measure:* comprehension, knowledge acquisition and retention, belief updating, critical evaluation of information, metacognitive awareness (e.g., accurate calibration of what one does and does not know), perceived clarify, complexity of reasoning during/following interaction.
* **Affective outcomes:** How does the product make users feel? Do users report feeling supported, motivated, and capable after interactions, or are there indications of confusion, anger, or emotional distress?
  * *Constructs to measure:* mood, emotional valence and arousal, frustration, confusion, emotional granularity, felt support, sense of safety, sense of belonging, perceived empathy, trust, or comfort interacting with AI.
* **Behavioral outcomes:** Is the user doing something different? Are users taking small but meaningful actions that predict longer-term development?
  * *Constructs to measure:* application of new information, intent to try recommended behaviors, observable shifts in interaction patterns(e.g., asking more complex questions, prompt sophistication) that proxy for longer-term development outcomes, help-seeking behavior.
* **Motivational outcomes:** Does the product energize or deplete a users’ drive to pursue goals, learn, or act independently?
  * *Constructs to measure:* intrinsic motivation, curiosity, self-efficacy, perceived autonomy, goal commitment, persistence (not the same as perseveration), dependency (i.e., reduced willingness to attempt tasks without AI assistance).
* **Social and relational outcomes:** Does use of the product affect the user’s human-to-human relationships and broader social functioning?
  * *Constructs to measure:* social displacement (substitution of AI for human interaction), loneliness, perceived social support, quality of interpersonal communication, willingness to engage with others, trust in human versus AI sources of social support and information.
* **Well-being outcomes:** Does using the product provide broader and more distal effects on users’ overall quality of life and psychological health beyond momentary feelings/mood?
  * *Constructs to measure:* life satisfaction, meaning and purpose in life, flourishing, burnout (i.e., in professional contexts), perceived agency/control over one’s environment.

#### Level 3 vs. Traditional User Research

User research plays a critical role across the product lifecycle (e.g., [Discover, Explore, Test, and Listen](https://www.nngroup.com/articles/ux-research-cheat-sheet/)). However, in Level 3 we will focus on quantitative user research that captures intermediate outcomes at scale. These outcomes are sometimes observed in the product logs described in Level 2, but more often captured in surveys or automated analysis of user text, voice recordings, and other digital traces. At Level 3, our goal is to cheaply track psychological “states” and “traits” (e.g., cognitive, affective, and behavioral) across a large sample of users, for use in product monitoring and rapid-cycle experiments. We want to understand user shifts in psychology or behavior, but without needing to conduct bespoke qualitative research every time a new feature is being tested (even though at least one such qualitative evaluation should be conducted for the overall GenAI solution).

Of course, quantitative methods (e.g. logs, surveys, sentiment analysis) should be complemented with qualitative methods (interviews, ethnography) to validate why users behave the way they do. Ideally, qualitative user research contributes to the theory of change and informs every stage of the evaluation framework:

* Level 1 (AI system): Interviews help define the "Golden Dataset" by gathering realistic user questions, identifying edge cases, and defining "ideal" answers based on needs expressed by real users.
* Level 2 (Product): Interviews and direct observation can contextualize engagement data. If an A/B test reveals a drop in retention, qualitative research helps diagnose the underlying friction or confusion.
* Level 3 (User): Interviews or focus groups can validate that intermediate outcome metrics (e.g., user-reported confidence) actually correlate with real-world behavior changes.

#### Individual vs. System Outcomes

You may have noticed that in Level 3, we focus on individual outcomes, rather than the broader community or system. This is a practical choice, not a philosophical one. While changing social norms (e.g., how a village views vaccination) is often the ultimate goal of development, measuring those shifts requires slower, more extensive fieldwork (as discussed in Level 4).

In contrast, individual changes can be immediate. You can observe if a user is learning or motivated right now, even if the full impact of an AI product requires changes in social dynamics. This makes individual metrics the fastest, most sensitive signals for rapid product iteration.

Our recommendation: Do not ignore social dynamics, but do not let them slow down your experimentation. Measure what you can see today (i.e., the user’s behavior) and until you are ready for a full impact evaluation, use lightweight proxies to track social effects (e.g., asking "Did you share this advice with a neighbor?"). In-app questions about off-app behaviour can complement information about user behaviour.

<br>

***

<details>

<summary>💬 Want to suggest edits or provide feedback?</summary>

{% embed url="<https://tally.so/r/A788l0?originPage=level-3-user-evaluation%2Foverview%2Fwho-is-the-user-being-evaluated>" %}

</details>