ResearchSafety

Human-in-the-Loop: Building Trust Between AI Agents and Teams

The goal of an autonomous agent system is not to remove humans from the loop — it's to put human attention where it matters most. Getting that right requires a model of when to act and when to ask.

The automation paradox

Teams that adopt agent systems often expect them to work best when given maximum autonomy. In practice, the opposite is frequently true — at least initially. An agent with no human feedback mechanism will accumulate errors silently. An agent that asks about everything will create more work than it saves.

The right design isn't maximum autonomy or maximum oversight — it's calibrated autonomy: high independence for routine decisions where the agent has a strong track record, and automatic escalation for decisions that are novel, high-stakes, or irreversible.

Trust as a dynamic quantity

We model trust not as a static property of an agent, but as something that is earned through demonstrated accuracy over time. An agent that has correctly handled a hundred similar decisions in a row has earned more latitude than an agent that is attempting a class of problem for the first time.

This has implications for system design. Trust must be tracked granularly — at the level of task type and decision class, not just the agent as a whole. An agent that is very reliable at one kind of task may have no track record on a different kind, and the system should treat these as different trust contexts.

Momentum as a feedback signal

We also found value in tracking not just raw accuracy but streaks of consecutive correct decisions. An agent on a long streak of correct outcomes in a given area should be treated differently from an agent with the same average accuracy that oscillates between correct and incorrect decisions.

Streaks indicate that the agent has found a reliable pattern for this class of problem. Oscillation indicates that the agent may be right by chance, or that the task is at the boundary of what it can reliably handle.

Approval UX matters

Human oversight is only valuable if the human actually engages with it. Approval workflows that are cumbersome, poorly contextualized, or interrupt human workflows at the wrong moment will be ignored — which is worse than not having them at all, because it creates a false sense of oversight.

We invest heavily in making approval interactions fast and well-contextualized: showing the human exactly what the agent wants to do, why it believes it's the right action, and what alternatives exist. A human who can approve or reject a decision in thirty seconds, with full context, is far more useful than one who needs to investigate the situation before they can engage.

Channel flexibility

People work across different communication channels at different times. An approval that can only be handled through a web UI will be missed by someone who is currently on their phone. We built our approval system to surface decisions across the channels where the human is actually present — and to make the approval action available in each of those contexts without requiring a context switch.