Designing Decision Boundaries for AI Agents: Clarence as an Agentic UX Case Study

Agentic UX starts at the boundary

Most AI product language still treats autonomy like a volume knob. More autonomy sounds more advanced. Less autonomy sounds safer. I do not think that frame is strong enough for real agent systems.

The better question is jurisdiction. What is this system allowed to decide? What evidence does it need before acting? What has to be shown to the human first? What happens when the system is confident and wrong?

That is where Clarence became a UX project instead of just an AI project. The hard work was not the conversation layer. The hard work was deciding where its authority stops.

A boundary is a design object

In a normal interface, a boundary might be an inactive control, a confirmation screen, a permissions prompt, or an undo stack. In an agentic system, the same design problem appears as policy: when to ask, when to act, when to escalate, when to stay silent, and when to refuse the premise.

Clarence has to operate across different surfaces: Telegram for live conversation, Discord for reports, the local filesystem for builds and verification, a SQLite knowledge base for durable memory, and Claude Code as a bounded specialist lane for repo-level review. If every surface had the same permission level, the system would be fast and reckless.

The product is the boundary map. The interface is only one way that map shows up.

What Clarence can decide

Clarence has authority over evidence-gathering. When a task is clear enough to scope, it can decide how to gather context: read files, search memory, inspect git state, run builds, check TypeScript, compare live system state against public claims, and escalate to a specialist model when the work is narrow enough to justify it.

It can also decide when a task is draftable. That sounds minor, but it matters. A good agent should not make the human watch it perform every intermediate step. It should split the work, delegate the bounded pieces, keep the main judgment loop intact, and come back with evidence.

Those decisions are operational. They make the work faster without changing who owns the work.

What Clarence cannot decide

Clarence has no authority over representation. It cannot decide what belongs under my name. It cannot publish a new public page just because the material seems strong. It cannot turn uncertainty into a cleaner claim. It cannot treat a confident answer as verified just because the prose sounds stable.

It also cannot treat memory as an open write surface. Claude can read through bounded MCP paths and submit handoffs. Hermes evaluates what should become durable memory. That separation is not bureaucracy. It is the difference between a useful memory system and a rumor mill with embeddings.

The system can draft, audit, critique, and execute scoped internal work. Judgment, public claims, and durable identity stay with me.

Autonomy depends on recoverability

The safest actions are not always the smallest ones. The safest actions are the ones with a clear recovery path. Reading files is low risk. Running a build is low risk. Drafting a local page is low risk. Publishing a new page under my name is different, because the audience changes.

This is why Clarence treats internal and external actions differently. Inside the machine, it should be bold. It should inspect, test, patch, and verify when the scope is clear. At the public edge, it slows down. A portfolio page, a social post, an email, or a durable memory about who I am is not just output. It is representation.

Autonomy earns trust when mistakes are contained. It loses trust when errors cross into public identity before the human can see them.

Verification is part of the interface

The useful agentic UX pattern in Clarence is not a chat bubble. It is the verification loop around the chat bubble. Ground first. Draft only what should be drafted. Attack the first answer. Check claims against live files, source documents, build output, TypeScript, git diffs, and live routes when those things matter.

This portfolio site made that visible because polished drift is easy to ship. A number can be true in March and false in April. A public endpoint can work locally and fail after static export. A case study can sound better than the evidence behind it. Clarence helps by making those failures harder to miss.

Fluency is not proof. Verification is the receipt.

The strongest boundary is voice

Writing is where the boundary gets personal. Clarence can produce prose quickly, but fast prose can flatten a person. It can turn a real argument into a symmetrical paragraph machine. It can make a portfolio sound like every other AI-assisted portfolio.

My rule is simple: Clarence can help with drafting and critique, but it has to be grounded against the work and against my voice. No em dashes. No soft claims. No filler. No fake balance. No broad AI generalizations standing in for an actual argument.

That is not style policing for its own sake. Voice is an ownership boundary. If the page sounds like the machine instead of me, the system failed even if every sentence is grammatically clean.

The unsolved part is status

The most persistent weakness in Clarence is still visibility of system status. Silence is ambiguous. A quiet system might be working, finished, blocked, rate-limited, or broken. For a tool that can act across sessions, that ambiguity becomes expensive fast.

Acknowledge First exists because of that failure. Every substantive task should begin by saying what is happening and why. That rule is useful, but it is not a full status interface. It is a behavioral patch over a deeper design problem.

The next version of agentic UX needs better ways to show state: working, waiting, delegated, verified, blocked, published, or unsafe to continue. Logs are not enough. The human needs a model of what the agent believes it is allowed to do right now.

Why this is UX, not prompt engineering

Prompting can shape behavior inside one response. Decision boundaries shape the relationship between responses, tools, memory, public action, and human authority. That is an interaction design problem.

Clarence matters to me because it turns that abstract problem into something observable. I can see where the system helps, where it overreaches, where it waits too long, where it moves too fast, and where the boundary itself needs redesign.

Agentic UX is not about making an AI seem more human. It is about making the system's decision rights legible enough that a human can trust it without surrendering judgment.

The broader Clarence case study is here: Clarence.

The companion piece on everyday collaboration is here: Working With Clarence.

ClarenceAgentic UXAI SystemsDecision BoundariesHuman-AI CollaborationVerificationSystem Design