← Work
Systems Design2026Independent Research · Ongoing

Clarence:
Designing an AI Collaboration System

Trust CalibrationHuman-AI CollaborationHermes OrchestrationClaude Code Specialist LaneSQLite Knowledge BaseMCP Bridge ArchitectureSemantic Vector SearchPublic-Safe TelemetryPlatform Migration
Live systemInteractive public-safe surfaces from the current Clarence stack
3,291Active memories
4,355Total memories
10,268Active facts
14,704Total facts

Verified as of June 10, 2026. Active memories and active facts have full vector coverage. Totals include archived historical records.

Live Knowledge GraphInteractive public slice of the Clarence entity graph

The Design Question

Clarence asks a design question I kept running into while using AI systems every day: how much autonomy should a collaborator have before it becomes harder to trust?

I did not build Clarence as a chatbot. I built it because ordinary AI tools forget too much, hide too much, and make the user reconstruct context every time the window resets. The goal was a working collaboration system with memory, task routing, scheduled maintenance, and enough judgment to act without turning into a black box.

The hard UX problem is not whether the system can do useful work. It can. The hard problem is trust calibration. When should it act without asking? When should it stop and ask for approval? When does a missing signal mean everything is fine, and when does it mean the reporting layer failed?

The System Shape

The current system runs through Hermes as the primary orchestrator, with Claude Code used as a bounded specialist path for repo investigation and implementation support. Long-term memory lives in a SQLite knowledge database. Agents read through capability-limited MCP tools, write through stricter internal paths, and hand work back through explicit summaries instead of pretending continuity is magic.

The system also runs scheduled jobs for maintenance, research, reporting, and drift checks. Some work happens while I sleep. Some work happens during live sessions. The design problem is making both modes observable enough that silence does not become a second interface.

The public surfaces above are part of that same design decision. The Pulse page exposes coarse public-safe telemetry. The knowledge graph shows structure without exposing private entity names. The stats block uses one verified snapshot instead of repeated counts scattered through the page.

Trust Calibration

Clarence works best when autonomy is bounded, visible, and recoverable. Every meaningful task now starts with acknowledgment, scope, and a time estimate. Claims that reach the public site get checked against source files, generated artifacts, workflows, and live pages. Memory claims are checked against the database before they become public copy.

The Acknowledge First rule came from real friction. Work would disappear into subagent execution with no signal back about whether the task was understood correctly. The fix was not more autonomy or less autonomy. It was better communication at the boundary.

The larger trust mechanism is explicit continuity rather than magical continuity. Hermes keeps hot memory, session handoff notes, and a durable SQLite knowledge layer live today. The older nightly conversation-distillation idea remains part of the project history, but it is not treated as a current guarantee.

What Changed

Clarence has already gone through platform failure. The original system ran on OpenClaw. When that foundation became unavailable, I migrated the runtime to Hermes while preserving the parts that mattered: memory, scheduling, handoffs, and the collaboration patterns. The migration clarified what was architecture and what was scaffolding.

The move forced a cleanup of assumptions. I stopped treating Clarence as a clever prompt chain and started treating it like infrastructure. That meant defining what belongs in hot injected memory, what belongs in structured database memory, and what should be recovered from transcript history only when needed.

The portable pieces survived because they had real shape: a SQLite database, documented persona and operating rules, explicit handoffs, static site deploy checks, and public-safe telemetry. The pieces that were implicit broke first. That is the useful lesson from the migration.

The Main Failure: Visibility of System Status

Clarence still breaks Nielsen's first usability heuristic: visibility of system status. A system should keep users informed about what is going on through timely feedback. Clarence does this better than it used to, but not well enough.

A cron job can run overnight and deliver a report. A subagent can investigate a repository. A watchdog can check a bridge. But during live work, silence is still too ambiguous. The user has to infer whether the system is thinking, waiting, stuck, or done.

That is not a logging problem. It is a feedback-design problem. Solving it means treating system status as a first-class UX surface, not as a side effect of Telegram messages, Discord posts, or tool output.

Honest Challenges

A portfolio case study that only shows what worked is a sales document. Clarence is useful today, but these are still open problems.

The Plan-Execution Gap

Hermes is strong at framing work, memory stewardship, and deciding when Claude Code is worth invoking. The current handoff stack is strong for bounded parallel investigation. Implementation is still more uneven. Part of the original tandem plan remains a roadmap item rather than a finished capability.

Budget Constraint as Design Constraint

Compute cost is a real design material. GPT-5.5 is the daily driver because it is the main conversational and planning layer. Claude Code is protected for the top slice of tasks where repo-level execution or code reasoning matters. Escalating too early wastes scarce capacity. Escalating too late wastes time and trust.

Memory Growth Without Garbage Collection

The knowledge base grew from a small hand-curated memory file into thousands of structured records after vault fact extraction processed notes and documents. More data does not automatically mean better recall. Memory needs pruning and consolidation rather than accumulation alone.

Platform Fragility

Twitter/X posting was built, tested, and blocked by anti-automation fingerprinting within a single session. OpenClaw became unavailable as a viable runtime. These are not edge cases. They are reminders that systems built on platforms you do not control must be designed for migration.

Visibility of System Status (Nielsen #1)

The most persistent unsolved problem is still status visibility. Reports can be delivered, jobs can be logged, and workflows can complete, but silence remains ambiguous during live work. The system needs a clearer progress surface before autonomy can feel fully trustworthy.

What the System Does Today

  • 28 enabled scheduled jobs across overnight and daytime windows, plus a bridge health watchdog running every five minutes
  • A single authoritative knowledge database with full active-record embedding coverage across two vector sets
  • Explicit handoffs, memory tools, and a SQLite knowledge layer that preserve continuity across sessions
  • Read-only MCP retrieval plus stricter internal write paths, so external access does not become uncontrolled mutation
  • Public-safe Pulse and graph surfaces that prove the system exists without exposing private memory content

Documentation and Public Boundary

Clarence has a dedicated system wiki because too much durable system knowledge was trapped in chat history, repo docs, and fragile notes. The raw internal wiki stays private. Public documentation has to be curated, because implementation notes can expose paths, operational assumptions, and private context.

I wrote more about that decision here: Why Clarence Needed a System Wiki. The public-safe architecture repository is separate from the private operating record: github.com/nomadjames/clarence-architecture.

Concepts and Skills

Agent orchestration designTrust calibrationHuman-in-the-loop boundary designMemory persistence across AI sessionsMCP bridge architectureSQLite knowledge database designSemantic vector searchPublic telemetry designPlatform migration under constraintStatus visibility critiqueSystems thinking