agentproto

AIP-45: AGENT-CLI.md — agentcli-interactive/v1 (interactive agent CLI manifest)

A markdown + frontmatter format for declaring an interactive agent CLI — a long-running, bidirectional, agent-as-process binary like Hermes Agent, Claude Code, OpenCode, Goose, or Gemini CLI — and how to install, spawn, and converse with it. Layers over AIP-29 (CLI.md) for install/version/auth blocks and AIP-44 (ACP.md) for the session/wire model when protocol=acp; falls back to MCP or proprietary protocols via a discriminator.

FieldValue
AIP45
TitleAGENT-CLI.md — agentcli-interactive/v1 (interactive agent CLI manifest)
StatusDraft
TypeSchema
Domaincli.sh
RequiresAIP-17 (RUNNER), AIP-19 (SECRETS), AIP-29 (CLI), AIP-36 (SANDBOX), AIP-44 (ACP)
Reference Impl@agentproto/driver-agent-cli

Abstract

AGENT-CLI.md packages an interactive agent CLI — a binary that hosts an agent loop (Hermes, Claude Code, OpenCode, Goose, Gemini CLI) and converses bidirectionally with a client over stdio or a similar duplex channel. Each manifest declares how to install, version-check, sandbox, authenticate, and talk to the binary, with a protocol discriminator (acp | mcp | proprietary) selecting the wire shape.

AIP-45 is a sibling, not a rival, of AIP-29. AIP-29 remains the home for tool CLIsgh, stripe, kubectl, ffmpeg — whose semantics are one-shot cmd → output. AIP-45 covers agent CLIs whose semantics are persistent sessions with streamed turn events. The two share the install / version / auth / sandbox substrate via JSON Schema $ref, so a future "tool CLI that has both modes" can carry both manifests without duplication.

Motivation

Agent runtimes shipped as CLIs are a fast-growing category:

  • Hermes Agent (Nous Research, ACP server)
  • Claude Code (Anthropic, ACP via @agentclientprotocol/claude-agent-acp)
  • OpenCode (sst, native ACP support)
  • Goose (Block, MCP-over-stdio)
  • Gemini CLI (Google, proprietary REPL)
  • Cursor background agents (ACP)

Today every product wires each one bespoke: invoke it as a subprocess, parse its prompt / output / streaming events, manage its lifecycle, hand it secrets, route tool calls. The cost compounds across products — Guilde, Katchy, Simone, dev tooling — and across platforms (macOS, Linux, headless servers, Modal/Daytona sandboxes).

AIP-45 factors the integration shape into a single declarative file plus a single TypeScript runner (@agentproto/driver-agent-cli), the same way AIP-29 factors tool-CLI integration. The discriminator means new agent CLIs join the catalog by writing a manifest, not by writing runner code.

The companion observation: agent CLIs converge on ACP. Hermes, Claude Code, OpenCode, Cursor all speak it. Treating ACP as the default while leaving room for MCP (Goose) and proprietary REPLs (Gemini CLI today) yields one runner with one wire-stable arm and two fallback adapters.

Specification

This section uses RFC 2119 keywords (MUST / SHOULD / MAY).

Conformance

A manifest is AIP-45-conformant iff:

  1. Its frontmatter validates against ./resources/aip-45/draft/AGENT-CLI.schema.json.
  2. Its install, version_check, auth blocks satisfy AIP-29's schemas (referenced via $ref).
  3. When protocol: "acp", the binary's ACP server satisfies the upstream ACP specification at the AIP-44 acp_rev declared by the bound ACP.md.
  4. When protocol: "mcp", the binary speaks MCP-over-stdio per modelcontextprotocol.io.
  5. When protocol: "proprietary", the manifest's adapter package (adapter field) implements the AgentCliClient interface from @agentproto/driver-agent-cli.
  6. When modes is present, every entry has a unique id matching ^[a-z0-9][a-z0-9\-]*$; mode patches reference no symbols beyond bin_args_append and env.
  7. When options is present, every entry has a unique id matching ^[a-z0-9][a-z0-9_]*$; type: enum entries declare an enum array; min/max only appear on type: integer.
  8. When continuation is present, default is in supported; if default == "native-resume" then capabilities.resumable: true MUST also be declared.

Frontmatter

The full schema lives in ./resources/aip-45/draft/AGENT-CLI.schema.json. Top-level fields (informative summary):

FieldRequiredDescription
nameYesKebab id matching parent dir
idYesStable runtime id
descriptionYesOne-paragraph purpose
versionYesSemver of this manifest
binYesPath to the binary
bin_argsNoDefault argv (e.g. ["acp"] for hermes acp)
installYesArray of install methods (AIP-29 $ref)
version_checkYesVersion detection (AIP-29 $ref)
setupNoPost-install configuration steps (AIP-29 $ref) — daemons, bind-time prompts, OAuth, external pickers
authNoAuth surface (AIP-29 $ref)
sandboxYesSandbox profile (AIP-36 $ref)
protocolYesacp | mcp | proprietary
acpWhen protocol=acpRef to AIP-44 ACP.md describing the wire profile
mcpWhen protocol=mcpInline MCP server config
adapterWhen protocol=proprietaryNPM package implementing AgentCliClient
sessionNoSession policy: mode, idle_timeout_ms, max_turns
modelsNoModel routing: default, allowed, env-mapped
capabilitiesNoDeclarative capability flags (mirror of ACP capability map for non-ACP runtimes)
modesNoMutually-exclusive operation modes (e.g. claude-code's plan / accept-edits / bypass-permissions). One active per turn.
optionsNoIndependent typed knobs (model, max-turns, auto, ...). Multiple may be active per turn.
continuationNoHow prior turns reach the CLI. Declares default + supported strategies.
runnerNoAIP-17 RUNNER.md ref or inline runner block

protocol discriminator

Three values, with cross-field requirements:

# protocol: "acp" → MUST declare `acp` referencing an AIP-44 ACP.md
protocol: acp
acp: ./hermes-acp.ACP.md

# protocol: "mcp" → MUST declare `mcp` with server config
protocol: mcp
mcp:
  command: goose
  args: [serve]
  transport: stdio

# protocol: "proprietary" → MUST declare `adapter` (NPM package)
protocol: proprietary
adapter: "@agentproto/adapter-gemini-cli"

Runtimes that don't recognise the declared protocol MUST refuse to load the manifest; this is a hard error, not a warning.

Session model

session:
  mode: ephemeral | persistent | resumable    # default: ephemeral
  idle_timeout_ms: integer                    # default: 600000 (10 min)
  max_turns: integer                          # default: unlimited
  context_carryover: boolean                  # default: true within a session

ephemeral sessions are torn down on disconnect. persistent sessions survive client reconnect within idle_timeout_ms. resumable sessions are durable across runs (requires the binary to support session/load or equivalent — clients MUST refuse to declare resumable against a binary lacking the capability).

Stream event taxonomy

When the runtime relays the binary's output to a host, events conform to a closed taxonomy regardless of underlying protocol:

EventDescription
text-deltaStreaming agent message text chunk
tool-callAgent invoking a tool — args + tool id
tool-resultTool's response delivered back to the agent
thoughtAgent's internal reasoning (when capability declared)
agent-promptAgent asking the user / client for input mid-turn
turn-endAgent has finished the current turn
errorOut-of-band error during the turn

The runner translates protocol-specific events to this taxonomy exactly once at the protocol arm boundary; downstream consumers never see protocol-specific shapes.

Lifecycle

Discover (manifest in registry)

Install (run install method matching host OS/arch)

Version-check (regex against `bin_args + version_check.cmd`)

Spawn subprocess (apply sandbox + auth env)

Handshake (per protocol arm)

Open session (session/new or equivalent)

LOOP: send turn → consume stream events → emit AIP-7 audit

Close session (session/close or equivalent)

Tear down subprocess (SIGTERM 5s → SIGKILL escalation)

Capabilities

capabilities is a manifest-declared mirror of behaviours (separate from the runtime-negotiated ACP capabilities). It tells the host what to expect before boot:

capabilities:
  streaming: boolean       # streams text-delta events
  tool_calls: boolean      # emits tool-call events
  sub_agents: boolean      # may delegate to other agents
  file_io: boolean         # may read/write workspace files
  multimodal: boolean      # accepts non-text input
  resumable: boolean       # supports session/load semantics
  bidirectional: boolean   # may emit agent-prompt

When protocol: "acp", capabilities MUST be a subset of the ACP capabilities the bound AIP-44 manifest declares; the runner enforces this at load time.

Modes

modes is OPTIONAL. When present, it declares the set of mutually- exclusive operation profiles the CLI exposes. The host picks at most one mode per turn via OPERATOR.runtime.config.mode (AIP-9). Each mode may patch the spawn bin_args and env; the runner applies mode patches AFTER the manifest's default bin_args and BEFORE option patches.

modes:
  - id: default
    description: Standard interactive session
  - id: plan
    description: Plan-only mode — no edits, no commands
    bin_args_append: ["--permission-mode", "plan"]
  - id: accept-edits
    bin_args_append: ["--permission-mode", "acceptEdits"]
  - id: bypass-permissions
    bin_args_append: ["--permission-mode", "bypassPermissions"]

Mode ids MUST be unique. The default mode (when the operator omits config.mode) is "no patch applied" — the manifest's bin_args run verbatim. Hosts MUST reject an OPERATOR.runtime.config.mode value that is not in the manifest's modes[].id list.

Options

options is OPTIONAL. When present, it declares typed knobs the CLI accepts. Unlike modes, options are independent — multiple may be active in the same turn. Each option declares a type (boolean | integer | string | enum) and how its value patches bin_args / env.

options:
  - id: model
    type: enum
    enum: [claude-sonnet-4-6, claude-opus-4-7, claude-haiku-4-5]
    bin_args_template: ["--model", "{value}"]
  - id: max_turns
    type: integer
    min: 1
    max: 200
    bin_args_template: ["--max-turns", "{value}"]
  - id: auto
    type: boolean
    bin_args_append_when_true: ["--auto"]

Patch semantics:

  • bin_args_template — appended when the value is non-default. The literal token {value} is replaced with the option's value (stringified). Use for value-bearing flags.
  • bin_args_append_when_true — applies only to type: boolean. Value must be true. Use for bare flags.
  • env — merged into the spawn env. Values may contain {value}.

Hosts MUST validate OPERATOR.runtime.config.options.<id> against the matching option's type / enum / bounds before spawn. Unknown ids are a hard load-time error.

Continuation

continuation is OPTIONAL but RECOMMENDED. It declares how prior conversation turns reach the CLI on subsequent invocations. The driver hosts a strategy registry; this block names which strategies fit this CLI and which is the default.

continuation:
  default: pinned-session
  supported: [pinned-session, transcript, none]
  pinned_session:
    idle_timeout_ms: 1800000
    key_scope: [conversation, operator]

Built-in strategy ids (continuationStrategyId enum):

IdBehaviourBest for
noneFresh session per turn, no prior context. Current behaviour.One-shot tools; debugging
pinned-sessionDriver keeps the spawned child process alive across turns; subsequent session.send calls hit the SAME child, so the CLI's in-memory model context carries over. Idle TTL eviction.CLIs declaring session.mode: persistent AND context_carryover: true (Claude Code, OpenCode, ...)
transcriptCaller (host) supplies the last N turns as text; the strategy prepends them as a preamble before the current message. Token-costly but stateless and survives process restarts.CLIs with resumable: false AND ephemeral session model; debugging when pinning misbehaves
native-resumeCaller passes a prior session id; the strategy passes it to the CLI's own --resume/--continue flag at start. Requires capabilities.resumable: true.CLIs with first-class durable sessions

Hosts MUST refuse to dispatch with a strategy outside supported. The operator MAY override default via OPERATOR.runtime.config.continuation provided the chosen id is in supported.

pinned_session.key_scope is the composite key the driver uses to decide whether two turns share the same pinned child process. Default [conversation, operator] means: different conversations get different children; one operator-per-conversation reuses one child across turns.

Sandbox integration

sandbox is REQUIRED. AIP-45 inherits AIP-36 verbatim — the same provider (local, mastra-modal, mastra-daytona, mastra-e2b, node-permission, etc.), network.egress, mounts, limits, env fields apply. The runner passes the resolved sandbox to the protocol arm; the protocol arm decides how to expose it (e.g. ACP's cwd parameter on session/new).

Auth surface

auth is OPTIONAL but RECOMMENDED. When present, follows AIP-29's schema verbatim: ref to an AIP-19 SECRETS.md, state.{paths,env}, login.cmd, refresh.cmd, expiry. The runner resolves secrets via @agentproto/secrets and passes them to the subprocess via sandbox.env.set (never argv, never logs).

Setup surface

setup is OPTIONAL. When present, mirrors AIP-29 § Setup verbatim — an ordered, idempotent pipeline of post-install configuration steps the host runs once per (bundle.id, workspace.id, user.id) after install and before the agent is considered ready. Step kinds: cmd (e.g. openclaw onboard --install-daemon), prompt (e.g. gateway URL, default channel), oauth (AIP-19 driver), external (browser-based pickers).

Use setup for one-time concerns the auth state machine isn't designed to model: sidecar daemons, bind-time configuration choices, secret pastes from web consoles, Discord-channel / Slack-workspace selection. Recurring credential rotation stays in auth.refresh.

Lifecycle integration:

install → version_check → setup → auth.login → spawn(bin) → handshake → session

When the manifest declares setup, the runner MUST refuse to spawn the binary until every step has either succeeded or matched its skip_if. Headless callers receive error.code = "setup_required" with the pending step list.

Rationale

Why a sibling of AIP-29, not an extension. AIP-29's central abstraction is commands — a tree mapping subcommand paths to TOOL.md refs. An agent CLI has no commands tree (or a degenerate one with one entry, "talk"). Stretching commands to model session-based interaction would either make AIP-29 schema bloat or make commands ambiguous. A sibling spec keeps both clean.

Why a protocol discriminator, not separate AIPs per protocol. The install / version / auth / sandbox / capabilities / session fields are 90% of an agent-CLI manifest. Splitting per protocol would force three near-duplicate schemas and three runners. One runner with three protocol arms keeps the integration surface one package.

Why ACP is the default. Network effects: 14k weekly TS-SDK downloads, multi-vendor adoption, a growing IDE ecosystem. New agent CLIs increasingly ship ACP support out of the box (Claude Code, OpenCode, Cursor announced). Treating ACP as the default and MCP / proprietary as fallbacks tracks where the ecosystem is going.

Why declarative capabilities mirror runtime capabilities. Hosts need to make routing decisions before boot — "send this user to Hermes (multimodal) or Goose (text-only)?" — without spawning every candidate. Manifest-declared capabilities give pre-flight visibility; runtime negotiation is the verification step.

Why stream events are normalised. A consumer (Guilde Shell view, Katchy workflow runner) doesn't care that Hermes uses ACP and Goose uses MCP — it cares that both produce text-delta and tool-call events. Normalising at the protocol-arm boundary lets every consumer ship one renderer.

Reference Implementation

@agentproto/driver-agent-cli — TypeScript runner that loads an AGENT-CLI.md manifest, applies the sandbox + auth, spawns the binary, dispatches to the protocol arm, and emits normalised stream events.

The package exposes:

  • defineAgentCli({...}) — validates manifest, returns an AgentCliHandle with a start(opts) → Session method.
  • Session.send(message) → AsyncIterable<StreamEvent> — write a turn, consume normalised events.
  • Session.cancel() — abort the in-flight turn (forwards session/cancel for ACP, equivalent for other protocols).
  • Session.close() — tear down the session and (if last) the subprocess.
  • Protocol arms under src/protocol/{acp,mcp,proprietary}.ts. The ACP arm delegates to @agentproto/acp (AIP-44).

The reference impl is the source of truth for the schema during the Draft phase; spec changes propagate through scaffold-aip regen.

Backward compatibility

Not applicable — first agentproto AIP for interactive agent CLIs. AIP-29 stays correct for tool CLIs; AIP-45 covers the agent-CLI case that AIP-29 wasn't designed for.

Security considerations

Inherits AIP-29's threat surface (install/version/auth/sandbox) and adds the bidirectional-protocol surface.

  • Untrusted agent output as prompt. The agent's text-delta and agent-prompt events end up in the host's UI. They are attacker-controlled if the agent's model was compromised by upstream prompt injection. Hosts MUST apply the same prompt- injection mitigations as for any user input.
  • Tool calls from the subprocess. Events of type tool-call ask the host to do something. Hosts MUST gate every tool call through AIP-7 governance — the manifest's capabilities.tool_calls flag is permission to emit, not permission to auto-execute.
  • Sandbox bypass via auth env. Secrets passed via sandbox.env.set may leak into the agent's tool calls (e.g. bash invocations) if the sandbox provider doesn't isolate child env. Sandbox providers that don't isolate child env MUST NOT be used with auth.
  • Resumable sessions and replay. A session: { mode: resumable } manifest persists conversation state. If the persistence store is shared, a malicious actor with read access can inject prior turns into the agent's context. Hosts MUST scope persistent session storage to the authenticated identity.
  • adapter packages from npm. When protocol: proprietary, the manifest names an NPM package the runner imports. Hosts MUST treat these the same as any other dependency — pin versions, audit publishers, prefer first-party packages.

See also

Resources

Supporting artifacts for AIP-45. Links open the file on GitHub — markdown and JSON render natively in GitHub's viewer. Browse the full resource tree →