# What is the “Product” being evaluated? To understand if users are actually finding value in your product, you must "instrument" your application, setting it up to automatically log specific user actions. The resulting log data allow you to track users as they move through the User Funnel: from their first interaction (Activation), to regular usage (Engagement) to long-term commitment or habit formation (Retention). In the tech sector, companies might track "clicks" and "purchases" as users move through a website. By analyzing logs, you can then identify which content or products are likely to bring users back to the website over time, or how different web experiences affect browsing time. In the development sector, we need to track actions that signal user intent, and estimate the value returned to users in response. For example: * For an AI Agronomist, instead of tracking page views you might track whether a farmer uploads a photo of a diseased crop, listens to audio advice to completion, or shares content with another person. * For an AI Tutor, you might track if a student completes a quiz, how many follow-up questions they ask in a single session, or if they return to the app the night before an exam. By analyzing these "digital traces," we can identify exactly where users lose interest. Does the farmer drop off because the photo upload takes too long? Does the student quit because the AI’s first response was too complex? **The good news:** Fortunately, Level 2 evaluation methods are well-scoped; you don't need to reinvent the wheel. The technology sector has spent decades standardizing digital product metrics. Most off-the-shelf analytics tools for web/mobile applications (like Amplitude or Google Analytics) come ready-made to measure the standard metrics you need, such as Daily Active Users (DAU), Time to Activate, and Retention Rates. The following table defines common Level 2 product metrics for each stage:

Stage	Metric	Examples	Notes
Acquisition	New Users (#): Total count of users entering the top of the funnel	# farmers consenting to receive WhatsApp messages; # students downloading an app; # health workers attending a training session.	There are costs associated with recruitment, so you may wish to target your ideal users effectively and efficiently. Track the source of each new user (e.g., "Referral" vs. "Field Visit") to identify which channels yield the most relevant users, then scale the channel with the highest yield.
Acquisition	Cost Per User (CAC): Cost of running a recruitment activity, divided by new users acquired. Also called User Recruitment Cost.	Cost of printing flyers / # of QR code scans. Cost of field agent stipend / # of farmer sign-ups.	High CAC may be unsustainable for low-margin services (i.e. general advice) but acceptable for high-impact interventions like urgent telemedicine consults.
Activation	Activation rate: % of users who complete the "First Value" action after signing up or being recruited.	% of mothers who complete their health profile % of teachers who create their first lesson plan	Activation measures if users actually start using the tool or service, not just if they installed it. It is a user's early experience with the product, so important to “get right” to prevent drop-off. Onboarding can also be a critical point for collecting user demographic data used for personalization and subgroup analysis.
Activation	Time to activate: Average time elapsed between sign-up and the first core action.	Minutes from first WhatsApp message to asking the first medical question Days from initial training to logging the first case data	New user interest tends to taper off exponentially after sign-up, so it’s important to encourage users to complete onboarding within the first few hours/days. Long activation times usually signal a confusing interface or a lack of trust or value.
Engagement	Monthly, weekly, and/or daily active users (MAU, WAU, DAU): Number of users using the app/feature in a time window.	# Community Health Workers using a reporting tool daily # Students using a study bot weekly before exams	Do not optimize for addiction. In welfare-focused apps, "Time to Success" (i.e., getting to an urgent answer quickly) is better than "Time on Device."
Engagement	Interaction depth: Volume of interaction per session (e.g., turns per conversation).	Average # of follow-up questions a student asks (signals curiosity) per session Rate at which a user accepts a suggestion (signals trust)	For a chatbot, the # of conversation turns can be good (deep inquiry) or bad (confusion). Pair engagement metrics with qualitative inquiry (Level 3). Remember that frequent interaction (e.g. page loads, button clicks, form submits, session length) may not translate to meaningful interaction.
Retention	Stickiness (DAU/MAU): Ratio of daily users to monthly users.	A ratio of 0.25 means the average user uses the tool 7-8 days per month. A ratio close to 1 indicates sustained value or habit formation.	High stickiness indicates the tool is part of a daily workflow. A low ratio suggests the tool is only useful for sporadic problems (e.g., seasonal crop disease) rather than daily habits. Is habit formation critical to your welfare goals?

In addition to these, you may want to specify negative metrics (or safety guardrails) that track problematic user behavior. For example, if a mother uses a health chatbot for an emergency, a long session may indicate confusion and inefficiency. We must aggressively distinguish between intense or extensive engagement, and effective engagement. ***

💬 Want to suggest edits or provide feedback?

{% embed url="" %}