Comparison · May 2026

The Two-Layer Stack.

Where AUX sits above Microsoft's Agent Governance Toolkit — and why the lower shelf getting cheaper is good news for the upper one.

In May 2026, Microsoft shipped the Agent Governance Toolkit — a Microsoft-signed, MIT-licensed, five-language SDK that intercepts every tool call an autonomous agent attempts, evaluates it against a YAML policy, and emits a tamper-evident audit record. It ships with 992 conformance tests across 10 formal specs, certified mappings to the OWASP Agentic AI Top 10, NIST AI RMF, the EU AI Act, and SOC 2. Version 3.7 is live. 2.3k stars in weeks. This is not a side project — it is Microsoft naming a category and laying the substrate.

Several people asked me the same question that week: what does this mean for auxfirst?

The honest answer is: it makes our work more buyable, not less. AGT is the runtime we have been waiting for. AUX is the discipline that decides what to instrument it with — and what the user actually feels when it fires.

This essay is about how the two stack.


What AGT actually does

Pages of marketing collapse into one sentence: AGT is the deterministic enforcement layer for AI agents. Every action — a tool call, a database write, a delegation to another agent — passes through application middleware that says allow, deny, or require human approval before the action reaches the wire.

It does this because the model layer cannot. Microsoft is unusually direct about this. Their own AI Red Teaming Agent work makes the point: prompt-level safety is probabilistic by construction, and adaptive attacks reach near-100% success rates against frontier safety-aligned models. Their conclusion in Lessons from Red Teaming 100 Generative AI Products is that "AI red teaming is never complete" because model-layer defenses cannot, by definition, be exhaustive.

AGT does not try to win that fight inside the prompt. It moves the control surface down the stack. By the time a model's intent reaches a tool, deterministic application code has already decided whether the call happens. Denied actions are not "unlikely." They are structurally impossible.

That is a security architecture. It answers three questions: Is this action allowed? Which agent did this? Can you prove what happened? If you are a CTO at a regulated firm shipping agents into production, those three questions are existential. AGT solves them at engineering grade.

It does not — and does not attempt to — answer a fourth question: will the user trust what comes back?

That is the question AUX exists to answer.


Two layers, one agent

Imagine a customer-facing agent that drafts a contract amendment, sends it for legal review, and asks the user to confirm. Six things have to be true for that interaction to be sound.

Three of them are AGT's territory. The agent must be allowed to draft contracts. The agent's identity must be cryptographically verifiable so legal can trust the chain of custody. The decision to send for review must be logged in a tamper-evident way so an auditor can reconstruct it later.

The other three are AUX's territory. The user must understand why the agent suggested that particular amendment — confidence cues, visible reasoning, surfaced sources. The user must have an obvious way to revise or undo the agent's draft without losing the work — Escape Hatch, Generative Momentum. And the user must walk away feeling that the agent acted with them, not at them — the difference between functional trust and advocacy trust.

Same agent. Same interaction. Six concerns. Two layers.

auxfirst + trustkit vs Microsoft Agent Governance Toolkit — layer comparison Two parallel columns showing how auxfirst and Microsoft AGT address the same agent concerns from different layers. Left column is auxfirst design discipline, right column is Microsoft infrastructure. auxfirst + trustkit DESIGN DISCIPLINE · TRUST AS THE PRODUCT Microsoft AGT INFRASTRUCTURE · ENFORCEMENT AS CODE Same agent · six concerns · two layers User relationship Persona, history, advocacy Agent identity SPIFFE, DID, mTLS, trust mesh Four trust stages Functional, contextual, judgment, advocacy Compliance evidence OWASP, NIST AI RMF, EU AI Act, SOC 2 Confidence cues Visible reasoning, sources, uncertainty Policy decisions Allow, deny, require approval Memory in motion Contextual recall, editable, transparent Memory governance Access policy, scope, audit Escape hatch, trust gaps Undo, revise, named failure taxonomy Kill switch, audit log Circuit breakers, Merkle tamper-evidence Schemas, services CC BY 4.0 YAML, agency engagements SDK, specs, CLI Python, TS, .NET, Rust, Go · 992 tests
Read down the left for the discipline · read down the right for the runtime · the columns pair, they do not compete

Where AGT is overwhelmingly stronger

I want to be candid about this, because honesty here is what earns the right to make the rest of the argument.

AGT is a v3.7 product with 2.3k GitHub stars, 1,731 commits, five language SDKs (Python, TypeScript, .NET, Rust, Go), Microsoft signing on every release, 13 dependabot ecosystems, CodeQL and Gitleaks in CI, seven fuzz targets, an OpenSSF Scorecard, and 992 conformance tests behind 10 formal RFC-2119 specifications. It integrates natively with Microsoft Agent Framework, Semantic Kernel, AutoGen, LangGraph, CrewAI, the OpenAI Agents SDK, Claude Code, Google ADK, LlamaIndex, Haystack, and a dozen others.

This is what a category substrate looks like. It is the work of a team that intends to be the default deterministic enforcement layer for autonomous agents, full stop. If you are designing an agent today and you do not have an answer for which YAML policy file evaluates every tool call before it executes, you should be looking at AGT, OPA, or Cedar. Building your own is engineering work you do not need to do.

trustkit, in comparison, is v0.2. Seven commits. Six first-wave repo READMEs. A planned aux-audit CLI that has not yet shipped its v0.1 binary. The schemas — aux-heuristics.yaml, trust-architecture.yaml, trust-gap-taxonomy.yaml — are CC BY 4.0 and citable, but they are definitions, not executable enforcement. trustkit is not trying to be AGT. It is trying to be a named standard for a different layer. Comparing the two as engineering artifacts is the wrong comparison.

That is the first thing to settle: they are not the same kind of thing.


Where AUX is structurally different

There are three things AGT does not have, cannot have without renaming what it is, and structurally will not absorb — because they belong to a different discipline.

A vocabulary for user-perceived trust. Our four-stage Trust Architecture — functional, contextual, judgment, advocacy — is a progression model of how the relationship between a human and an agent matures over time. AGT has a trust score, but it scores agent-to-agent identity for routing decisions inside the mesh. Different object, different verb. A user asking "do I trust this thing to draft my contract" is not asking the same question as a policy engine asking "is this agent's certificate valid." Both are real. Only one of them is what determines whether the user comes back.

A failure taxonomy organized around relationship breakdown. AGT's compliance mapping is anchored to OWASP's adversarial threat categories — memory poisoning, tool misuse, prompt injection. Those matter. They are not, however, how most production agents fail their users. Most agents fail by being technically correct and humanly wrong. They surprise the user with unrequested initiative. They lose context between sessions. They over-confidently assert. They place friction at the wrong moment. They cannot explain why they did what they did. Our Trust Gap Taxonomy names these failure modes precisely because no policy rule was violated when they happened — and a policy engine, by definition, cannot detect them.

Six interaction patterns framed as design discipline. Intent Handshake, Confidence Cues, Adaptive Canvas, Escape Hatch, Memory in Motion, Generative Momentum are not features. They are repeatable interaction structures that product, design, and engineering teams use to build agents people stay with. AGT has analogues to some of these primitives at the operator layer — the kill switch is a backend escape hatch — but the user never sees AGT. AUX is the layer the user experiences.

These three differences are not gaps in AGT. They are the seam where one discipline ends and another begins.


The overlap zone — and what to do about it

There are two places where AGT's vocabulary and AUX's vocabulary touch, and a careless reader could conflate them. Better to name them out loud than let them confuse a buyer.

Memory. AGT ships a memory governance primitive — access policies, scoping rules, audit on read and write. AUX has Memory in Motion as a foundational pattern — contextual recall surfaced transparently to the user, with editable affordances, used to compound personal value across sessions. These are not competing. AGT controls what the agent's memory can do. AUX designs what the user sees, edits, and trusts about it. A production system needs both: a memory policy AGT can enforce, and a memory experience the user can navigate.

The word "trust." AGT uses trust to mean machine-verifiable identity scoring inside an agent mesh. AUX uses trust to mean human-perceived reliability and advocacy. Same word, two referents. When a buyer searches for "agent trust framework," they will find both. The honest framing — and the one we are doubling down on — is that AGT trust is agent-to-agent and AUX trust is human-to-agent. They are complementary, and a serious agentic stack instruments for both.


How they compose

Here is the practical version. A team shipping a production agent today should be making three decisions, not one.

01

The deterministic enforcement decision

Which middleware sits between the model's intent and the wire? AGT, OPA, Cedar, or a homegrown policy engine. For most teams, AGT is now the default answer. It ships with framework adapters for everything in the agent ecosystem, it covers all ten OWASP Agentic Top 10 categories, and it gives you SOC 2 and EU AI Act evidence on day one.

02

The compliance evidence decision

Which regulatory frameworks does the agent need to map against? OWASP Agentic Top 10, NIST AI RMF, EU AI Act, SOC 2, ISO 42001, sector-specific requirements. AGT's compliance documentation does most of this for you — and crucially, it does it automatically, by capturing the decisions the policy engine made. Compliance is not a deliverable. It is a byproduct of enforced policy.

03

The trust experience decision

What does the user actually see, feel, and trust when the agent acts? This is where the AUX patterns live. Intent Handshake before commitment. Confidence Cues on every output the user might act on. Escape Hatch on every action that is not trivially reversible. Memory in Motion surfaced and editable. This decision shapes whether the agent becomes a tool the user merely tolerates or a collaborator the user actively prefers.

A team that makes the first two decisions and skips the third ships an agent that is enforceable, auditable, and ignored. A team that makes the third and skips the first two ships an agent that users love until it gets them sued.

You need all three.


What this means for buyers

If you are a CTO, a Chief AI Officer, or a COO sitting above an agent program, the practical implication of AGT's release is that one part of your problem just got cheaper. The deterministic enforcement layer is no longer something you are buying from a startup or building yourself. Microsoft has commoditized it, and they intend to keep doing so. Treat it as infrastructure. Adopt the substrate.

That leaves the harder, less-commoditized work: what does the user trust, and how do you design for it across the agent lifecycle? This is the layer where holding companies, FMCG marketing leaders, and platform product teams are still operating without a shared vocabulary. It is the layer where trust gap audits show that the agent's policy compliance is perfect and the user adoption curve is flat — because the trust experience was never designed.

This is the layer auxfirst exists to design. Our Blueprint Sprint produces an agent interaction blueprint that names the AUX patterns the agent must implement, the trust stages it must move the user through, and — increasingly — the AGT-compatible policy specifications engineering will instrument. Our Agent Experience Audit maps an existing agent against the four-stage Trust Architecture and the Trust Gap Taxonomy. Our Advisory Retainer supports product, design, and engineering as the agent's trust experience evolves with the user.

We work with AGT, not against it. We will be increasingly explicit about that in our deliverables.


The one sentence

If you remember nothing else from this essay, remember this:

AGT makes the agent incapable of misbehaving. AUX makes the user willing to keep using it.

Both are necessary. Neither is sufficient. They sit on different shelves, are bought by different buyers, and answer different questions — but a serious agentic stack needs both.

Microsoft just made the lower shelf significantly cheaper. That is good news for everyone designing the upper one.


If you are evaluating an AI vendor, comparing agent governance options, or designing the trust experience for an agent program, start a conversation. We help teams compose AGT-grade enforcement with AUX-grade design. Subscribe to the auxfirst Substack for what's coming next. Dispatches from the agentic frontier — published by auxfirst agency. © 2026.