agentproto

AIP-30: DRIVER.md — agentdriver/v1 (abstract driver supertype)

A markdown + frontmatter format for declaring a concrete implementation of one or more agent tools — its identity, kind (cli / http / mcp / sdk / builtin), install lifecycle, auth surface, sandbox profile, and per-tool dispatch bindings. The supertype every concrete driver AIP (CLI, HTTP, MCP, SDK) specialises, with the standard `defineDriver` entry-point signature.

FieldValue
AIP30
TitleDRIVER.md — agentdriver/v1 (abstract driver supertype)
StatusDraft
TypeSchema
Domaindrivers.sh
RequiresAIP-14 (TOOL), AIP-16 (IO), AIP-17 (RUNNER), AIP-19 (SECRETS)
Specialised byAIP-29 (CLI), AIP-31 (HTTP), AIP-32 (MCP), AIP-33 (SDK)
Resources./resources/aip-30DRIVER.schema.json, ADAPTER.md, EXAMPLES.md, SKILL.md

Abstract

DRIVER.md packages a concrete implementation of one or more agent tools — the binding that connects abstract TOOL.md (AIP-14) contracts to a runnable backend (a CLI binary, an HTTP API, an MCP server, an SDK function, a host-builtin). It is the abstract supertype of every concrete driver AIP: CLI.md (AIP-29) is kind: cli, forthcoming HTTP.md / MCP.md / SDK.md are kind: http / kind: mcp / kind: sdk.

A DRIVER carries everything not in the contract: identity, kind, install lifecycle, version detection, auth surface, sandbox profile, network policy, region, cost overrides, and the per-tool dispatch bindings (implements[]). Subtype AIPs add kind-specific fields on top — argv templates for CLI, endpoint shapes for HTTP, server refs for MCP, package paths for SDK.

The format is paired with a standard entry-point function, defineDriver(...), whose signature any implementation in any language exposes so callers, runtimes, and adapters share one contract.

The file is human-authored, version-controlled, machine-parseable, and grep-able — same posture as SKILL.md, TOOL.md, INTENT.md, CLI.md.

Motivation

The registry today has TOOL.md (AIP-14) for the abstract agent contract and CLI.md (AIP-29) for one specific concrete implementation kind (command-line binaries). Adding HTTP.md as AIP-31 today would copy 60% of CLI.md's structure: install lifecycle, version detection, auth surface, sandbox profile, region, policy tags, cost overrides. Same for MCP, same for SDK.

The supertype is missing.

Five problems compound when there's no abstract driver layer:

  1. One tool, one driver only. AIP-14 v1 conflated the contract (schemas, mutates, approval) with the implementation (entry, runner, code, secrets, network). One TOOL.md = one implementation. When the same logical operation needs multiple backends (image.create via OpenAI HTTP vs Replicate HTTP vs gh CLI vs self-hosted SDK), authors must duplicate the TOOL.md per backend, forking the contract and guaranteeing drift.

  2. No routing layer. Without a driver abstraction, choosing "OpenAI vs Replicate vs local SDK" lives in agent prompts, in if/else chains, in catalog UI configs. There's no place where the runtime can apply policy ("workspace forbids third-party LLMs"), capability gates ("CLI not installed"), or cost ranking ("free plan uses the cheap variant").

  3. Per-kind boilerplate explosion. Each "kind" of integration (CLI, HTTP, MCP, SDK, gRPC, browser-extension, …) needs install lifecycle, auth state machine, sandbox enforcement. Without a supertype, every new kind reinvents these fields with subtle incompatibilities.

  4. Catalog rendering is incoherent. "What can this agent do?" today requires walking the tool registry and the wrapper code per integration. With DRIVER as a first-class registry citizen, the answer is "tools × matching drivers, filtered by capability".

  5. Driver swap is a refactor, not a config change. Swapping Replicate for OpenAI on image.create should be editing one file. Today it's grep across the codebase.

DRIVER.md gives this layer a name and a file format.

Design principles

  1. Abstract over concrete kinds, but no further. DRIVER is the supertype only of implementation kinds. It is NOT a supertype of TOOL (which is a contract, a different layer); not a supertype of INTENT (UX); not a supertype of SKILL (expertise). The AIP series' layering remains intact: SKILL → INTENT → TOOL → DRIVER → {CLI, HTTP, MCP, SDK}.

  2. Subtype declares its own fields, supertype declares the rest. CLI subtype adds bin, bin_args, sandbox.fs/exec/tty, output.exit_codes/json_flag. HTTP subtype adds endpoint, method, body_template, response_extract. MCP adds server_ref, mcp_tool_name. SDK adds package, function_ref. Everything else — install, auth, version_check, sandbox.network, region, policy_tags, cost_override, implements[] — lives in the supertype.

  3. Tools narrow contracts; drivers narrow tools. A TOOL declares its full input schema. A DRIVER MAY declare schema_narrowing on its implements[] entries to drop optional inputs the particular backend doesn't support (e.g. OpenAI DALL-E doesn't accept seed; declare schema_narrowing.drop_inputs: [seed]). Widening (extra inputs the contract doesn't know) is forbidden; that's a different TOOL.

  4. Multi-tool by design. A single DRIVER MAY implement multiple TOOLs — one HTTP API providing image.create, image.edit, image.variation is one DRIVER, not three. Auth, sandbox, region, policy_tags are declared once and reused. The execute block is keyed by tool id.

  5. One-tool-many-drivers is the routing surface. The inverse is what makes the abstraction valuable: many PROVIDERs can declare the same TOOL in their implements[]. The runtime resolver picks one per call based on capability, policy, cost, region, and pin (see Multi-driver routing).

  6. Auth state is per-driver, not per-tool. A single auth surface (env vars, login flow, refresh cadence, expiry signal) covers every tool a driver implements. Tools don't re-declare secrets bindings; they inherit from the driver's auth.ref.

  7. Sandbox is the declared policy contract. Hosts MUST enforce the declared sandbox block (network egress, fs, exec, env, tty). A driver that declares network.egress: ["api.foo.com"] and tries to dial api.bar.com MUST fail closed, not silently succeed. The supertype's sandbox is the universal subset (network); subtypes extend it (CLI adds fs, exec, tty).

  8. Frontmatter is the source of truth. When the entry exports a field also declared in frontmatter and the values differ, the host warns and prefers the frontmatter. Entries are for behavioural adapters (custom login flows, output parsers), never for redefining identity.

  9. Cold-start order: drivers register, tools bind. The runtime loading order is: TOOLs → PROVIDERs → tool-to-driver binding. DRIVER load failures (missing binary, version mismatch, unauthed) don't fail the registry — they mark the candidate unavailable for resolver phase 2.

Specification

File location

Drivers live in a single folder:

.drivers/
  openai-images-http/
    DRIVER.md            ← this AIP (kind: http)
    driver.ts            ← optional entry (custom transport / response parser)
    SECRETS.md             ← AIP-19 inventory
    README.md              ← optional long-form
  replicate-flux-http/
    DRIVER.md
    driver.ts
.cli/
  gh/
    CLI.md                 ← AIP-29 specialisation; ALSO a DRIVER (kind: cli)
    cli.ts
    tools/
      pr-create/TOOL.md

A DRIVER MAY live colocated with its concrete bundle (e.g. .cli/gh/CLI.md IS a kind: cli DRIVER) or alone under .drivers/. The folder name SHOULD match the manifest's id.

Frontmatter

YAML frontmatter, delimited by --- lines. All fields are case-sensitive.

Required fields

FieldTypeDescription
namestringHuman-readable display name (1–80 chars).
idstringMachine identifier. Lowercase, digits, dashes, dots. 2–80 chars. Unique within the registry.
descriptionstringOne-paragraph purpose for an LLM caller. ≤2000 chars.
versionsemver stringSpec version of THIS driver. Bump on implements[] shape change, sandbox change, auth change.
kindenum"cli" | "http" | "mcp" | "sdk" | "builtin". Drives the subtype-specific frontmatter validation.
implementsobject[]≥1 entry. Per-tool dispatch bindings. See The implements block.

Optional universal fields

FieldTypeDefaultDescription
installobject[][]Install paths, in order of preference. CLI/SDK use; HTTP/MCP usually omit (no install needed). Methods registry inherited from AIP-29 § Install methods.
version_checkobjectnoneHow to detect & validate the installed version. CLI/SDK only. Same shape as AIP-29.
authobjectnoneAuth surface — env vars, state location, login flow, refresh policy, expiry signal. See The auth block.
networkobjectnoneUniversal sandbox primitive: { egress: string[] }. Outbound allowlist. Empty / missing = no network. Subtypes MAY extend with kind-specific blocks.
runnerobject | stringnoneAIP-17 runner block, inline or as a workspace-relative ref to a RUNNER.md. When omitted, hosts apply a subprocess default.
regionstring[]["global"]BCP-47 region tags or cloud regions ("us-east-1", "EU", "global"). Drives data-residency routing.
policy_tagsstring[][]Free-form policy markers ("pii-safe", "self-hosted", "third-party-llm", "hipaa", "gdpr") the resolver's policy filter reads.
cost_overrideobjectnone{ cost_class?, cost_units_per_call?, currency? } overriding the contract baseline. The resolver ranks candidates by cost_units_per_call when contract doesn't pin a default.
timeout_override_msintnoneNarrow the contract ceiling (never widen).
retry_overrideobjectnone{ max_attempts, backoff, initial_ms } overriding contract baseline.
health_checkobjectnoneCheap probe the resolver runs to confirm reachability. { method: "ping" | "exec" | "noop", cmd?, http?, expect_exit?, every?: ISO-8601 }.
requiresobject{}Capability requirements (AIP-7). Subfields: os: string[], arch: string[], min_disk_mb: int, min_memory_mb: int.
examplesobject[][]Driver-specific examples augmenting the routed tools' contract examples.
tagsstring[][]Free-form discovery tags.
metadataobject{}Free-form, namespaced. metadata.<host>.… keys tolerated by other hosts.

Subtype-specific fields

These live in the subtype AIP, not in this supertype. Listed here for orientation; see the linked subtype spec for the canonical shape.

kindSubtype-specific fieldsDefined in
clibin, bin_args, sandbox.fs/exec/tty, output.exit_codes/json_flag/json_flag_args/stream/error_streamAIP-29 CLI.md
httpendpoint, method, headers, body_template, response_extract, streamingAIP-31 (forthcoming)
mcpserver_ref, transport (stdio / sse / http), mcp_tool_name, prompts_refAIP-32 (forthcoming)
sdkpackage, package_manager (npm / pip / cargo / …), function_ref, args_templateAIP-33 (forthcoming)
builtinhost_id (the host runtime that provides this tool natively)This AIP, § Builtin drivers

Discouraged

driver, concrete, transport at the universal level — these are subtype concerns. Authors who feel the urge to add one of these to the universal block are signalling a missing subtype field that should land via AIP revision.

Body

Markdown body following the frontmatter. Recommended sections:

  • ## When to reach for this driver — what problems it solves, what it doesn't, vs siblings implementing the same TOOL.
  • ## Trade-offs — cost, region, latency, reliability vs other candidates for the same TOOL.
  • ## Gotchas — auth quirks, version skew, environment pitfalls.
  • ## Reference — links to upstream docs, status pages, support channels.

The body is informational. Drivers MUST function with adapters that read only the frontmatter.

The implements block

The most important block. Declares which TOOLs this DRIVER implements and how the call binds:

implements:
  - tool: ./tools/image-create/TOOL.md   # ref to TOOL contract (workspace-relative or registry id)
    version: "^1.0.0"                     # contract semver range this binding is valid for
    schema_narrowing:                     # optional: drop optional contract inputs
      drop_inputs: [seed, negative_prompt]
    mapping:                              # optional: per-tool input rename / transform
      prompt: prompt                      # explicit identity
      style:  artistic_style              # rename
      aspect:                             # transform (named transformer in the entry file)
        from: aspect_ratio
        transform: aspect_to_size
    cost_override:                        # per-tool override; falls through to driver-level
      cost_units_per_call: 4              # millicents
    metadata:                             # per-tool, kind-specific hints
      http:
        idempotency_key_header: "Idempotency-Key"
  - tool: ./tools/image-edit/TOOL.md
    version: "^1.0.0"

Each entry binds the DRIVER to one TOOL contract. Multiple entries in the same DRIVER bind multiple TOOLs (e.g. one HTTP API serving three different operations). Every entry in implements[] MUST have a corresponding execute body if the driver has a defineDriver entry (see The defineDriver standard signature).

schema_narrowing.drop_inputs is the contract-narrowing safety valve. The runtime resolver MUST refuse to route a call that uses a dropped input — caller error, not silent ignore. Drift between contract version (tool.version) and what the driver actually supports lives here, in the open.

The auth block

Auth surface declaration, generalised from AIP-29's CLI auth into a kind-agnostic shape:

auth:
  ref: ./SECRETS.md                  # AIP-19 inventory of env-var bindings
  state:
    paths: ["~/.config/<dir>"]       # CLI / SDK persistent state
    env:   ["FOO_TOKEN"]
  login:                             # interactive flow (when present)
    cmd: "..."                       # CLI drivers
    url: "https://driver.com/oauth"  # HTTP / SDK drivers (browser flow)
    interactive: true
    requires_callback_url: false
    completes_when:
      cmd: "..."                     # CLI variant
      exit_code: 0
      # OR
      http: { method: GET, url: "https://api.../whoami", expect_status: 200 }
  refresh:
    cmd: "..."                       # OR url:
    every: "PT24H"                   # ISO-8601 duration
  expiry:
    detect: "exit_code:4"            # CLI
    # OR
    detect: "http_status:401"        # HTTP
    # OR
    detect: "exception:AuthExpired"  # SDK

Detection vocabulary (exit_code:N, http_status:N, exception:Name, header:X-Auth-Status:expired) is open-ended; hosts MAY add new prefixes per kind. Bundles SHOULD use the canonical detection most natural to their kind.

Login state machine

Same three-state machine as AIP-29 § Login state machine:

unknown ──(version_check / health_check ok)──▶ unauthed ──(login completes)──▶ authed

                                  ▲ ─────(expiry detected)───────────────────────  ┘

State persistence is per (driver.id, workspace.id, user.id) tuple, not per-call. Hosts MUST persist state across runs.

Builtin drivers

A kind: builtin driver expresses "this tool is implemented natively by the host runtime, no external integration needed":

spec: agentdriver/v1
name: Host fs.read
id: host-builtin-fs-read
description: Workspace file read, host-native (no external binary or service).
version: 1.0.0
kind: builtin
implements:
  - tool: ./tools/fs-read/TOOL.md
    version: "^1.0.0"
metadata:
  builtin:
    host_id: agentik-runtime    # which host runtime provides this

Builtin drivers are how the runtime exposes its first-party capabilities without forcing them through CLI/HTTP/MCP wrappers. They have no install (already present), no auth (host-native), no network (no egress). They MAY declare policy_tags and region. Their execute body is the host's native function.

Stable identity

id + version together form the driver's stable identity. Two drivers with the same id but different major version values MUST be treated as distinct. Caches, audit logs, and tool registrations key on id@major.

version here is the manifest version (driver config). For CLI/SDK drivers, the binary version is covered separately by version_check.range. Bumping version_check.range to support a new gh major SHOULD bump the driver major, since tools written for the old binary range may break.

When a driver's implements[].schema_narrowing changes (a tool gains or loses a supported optional input), the driver MUST bump its major version. The schema validator enforces this via diff against the previously-registered version.

The defineDriver standard signature

Every implementation that consumes DRIVER.md and ships behavioural adapters MUST expose a function named defineDriver whose signature matches the contract below.

Most DRIVER.md files are frontmatter-only and don't need an entry — the host's reference adapter handles standard CLI/HTTP/MCP/ SDK flows generically. An entry is needed when:

  • The login flow needs custom callback-URL handling (browser-based OAuth with refresh logic).
  • The output format isn't text/JSON/YAML (custom delimiter, pseudo- CSV, binary streaming).
  • A tool's execute needs context-sensitive logic the standard dispatch can't cover.

Signature (TypeScript notation, normative)

defineDriver(definition: DriverDefinition): DriverHandle

interface DriverDefinition {
  // Identity — mirrors the manifest fields with the same names.
  id:           string
  name:         string
  description:  string
  version?:     string
  kind:         "cli" | "http" | "mcp" | "sdk" | "builtin"

  // What contracts this driver satisfies (≥1 entry).
  implements:   ImplementsEntry[]

  // Universal blocks (subset of frontmatter).
  install?:        InstallMethod[]
  versionCheck?:   VersionCheck
  auth?:           AuthConfig
  runner?:         RunnerConfig | string  // ref or inline
  network?:        NetworkConfig          // { egress: string[] }
  region?:         string[]
  policyTags?:     string[]
  costOverride?:   CostOverride
  timeoutOverrideMs?: number
  retryOverride?:  RetryPolicy
  healthCheck?:    HealthCheckConfig

  // The dispatch bodies — one per implemented TOOL id.
  execute: Record<string /* tool id */, ExecuteFn>

  // Optional behavioural adapters.
  login?:        (args: LoginArgs) => Promise<LoginResult>
  refresh?:      (args: RefreshArgs) => Promise<RefreshResult>
  parseOutput?:  (args: ParseOutputArgs) => ParseOutputResult     // CLI / SDK
  detectExpiry?: (args: DetectExpiryArgs) => boolean

  // Bookkeeping.
  metadata?:     Record<string, unknown>
  tags?:         string[]
}

type ExecuteFn = (args: ExecuteArgs) => Promise<unknown>

interface ExecuteArgs {
  /** Tool-shape input, validated against the TOOL's inputSchema by the host before dispatch. */
  input:       unknown
  /** Per-call context — when the TOOL declares contextSchema, host validates against it before dispatch. */
  context:     Record<string, unknown>
  /** Resolved driver state — auth, secrets, sandbox handle, region, signal. */
  driverCtx: DriverContext
  /** Caller-set abort signal. MUST be honoured. */
  signal:      AbortSignal
}

interface LoginArgs {
  context: DriverContext
  signal:  AbortSignal
}

type LoginResult =
  | { ok: true }
  | { ok: false; reason: "user_cancelled" | "callback_failed" | "upstream_error"; message?: string }

interface RefreshArgs {
  context: DriverContext
  signal:  AbortSignal
}

type RefreshResult =
  | { ok: true; nextRefreshAt?: string /* ISO-8601 */ }
  | { ok: false; reason: "auth_expired" | "upstream_error"; message?: string }

interface ParseOutputArgs {
  exitCode?: number     // CLI only
  stdout:    string | Uint8Array
  stderr:    string
  expected:  { format: "text" | "json" | "yaml" | "binary" }
}

interface ParseOutputResult {
  ok:     boolean
  value?: unknown
  error?: { code: string; message: string; retryable?: boolean }
}

Conformance rules

  1. One canonical name. The exported name MUST be defineDriver. Implementations MAY also re-export under host-idiomatic aliases (createProvider, driver, defineCli, defineHttp) — the canonical name is what DRIVER.md adapters reference.

  2. Frontmatter is the source of truth. When the entry exports conflicting values for a field declared in frontmatter, the adapter MUST surface a warning naming the field and prefer the frontmatter value. Entries are for behaviour, not identity.

  3. execute is keyed by tool id. Every entry in implements[] MUST have a corresponding key in the execute record. Hosts MUST refuse to register drivers whose execute keys don't match the implements[] set.

  4. Input validation happens at the contract layer. The host validates args.input against the TOOL's inputSchema (and contextSchema, when declared) BEFORE calling execute[<toolId>]. Driver bodies MUST NOT re-validate; they MUST trust the host narrowed the inputs.

  5. execute honours signal. Long-running calls (LLM streaming, slow CLI invocations) MUST observe the abort signal and stop promptly when the caller cancels.

  6. Sandbox is enforced by the host, declared by the driver. The driver's network/sandbox block is policy; the host enforces. A defineDriver body MUST NOT subvert host enforcement (e.g. by spawning child processes when the CLI subtype declares exec.allow: false).

  7. login / refresh honour signal. Browser-callback flows MUST abort cleanly when the caller cancels (e.g. user closes the prompt). Tools blocked on a hung login flow are a UX regression.

  8. parseOutput is pure. It consumes the raw output + exit code (CLI) and returns a structured result. It MUST NOT touch the network or filesystem; that's the runner's job.

  9. No I/O at module load. The module containing defineDriver MUST be safely importable as a side-effect-free unit. All I/O happens inside execute / login / refresh / parseOutput.

  10. Schema narrowing is non-extensive. schema_narrowing MAY drop optional contract inputs; MUST NOT add inputs the contract doesn't know about. To add driver-specific knobs, use metadata.<kind>.… in the implements entry.

Implementer's guide

For step-by-step guidance on building a defineDriver-conformant implementation, see ./resources/aip-30/draft/ADAPTER.md. The AIP only defines the contract; the resource doc walks an implementer through the projection.

Multi-driver routing

When a TOOL has N drivers, the resolver picks one per call. The algorithm runs in 6 phases — see AIP-14 § Driver resolution for the canonical description (TOOL is the layer that owns the resolver contract; DRIVER is what the resolver picks between).

Briefly:

Phase 1 — Candidate set
  candidates = drivers implementing(tool.id, tool.version)
  filter by tool.driver_constraints.forbid / require_kind
  filter by schema_narrowing compat with call inputs

Phase 2 — Capability gate
  drop drivers with failed install / version_check
  drop drivers in unauthed state (unless login-only invocation)
  drop drivers with stale failed health_check

Phase 3 — Policy filter
  drop drivers violating workspace policy_tags allowlist
  drop drivers without matching region (data residency)

Phase 4 — Pin override
  if context.pinnedProvider: return matching candidate or pinned_provider_unavailable

Phase 5 — Cost / preference rank
  prefer tool.default_implementation if surviving
  else rank by cost_units_per_call → kind preference (builtin > sdk > http > mcp > cli)
       → most-recent health_check pass → lex(id)

Phase 6 — Bind
  return { driver, mappedInput via implements[].mapping, driverCtx }

The kind preference order (builtin > sdk > http > mcp > cli) is heuristic, expressing "cheaper to dispatch and more reliable": builtin is always available, SDKs are in-process, HTTP is a network hop, MCP is a network hop with extra protocol overhead, CLI is a subprocess. Authors override per-tool with default_implementation or per-call with context.pinnedProvider.

Authoring with SKILL.md

The canonical way to generate a DRIVER.md is via a paired SKILL.md — distributed at ./resources/aip-30/draft/skills/author-driver/SKILL.md — that an agent loads when asked to wrap a backend as a driver. The skill walks the agent through:

  1. Identify the kind (cli / http / mcp / sdk / builtin) and the corresponding subtype AIP.
  2. Identify which TOOLs the driver implements. For each: pick from existing TOOL.md, scaffold a new TOOL.md via the AIP-14 author-tool skill, or declare schema-narrowing for partial support.
  3. Map the install lifecycle (CLI/SDK only — install methods, version detection regex).
  4. Map the auth surface (env vars, login flow, refresh cadence, expiry signal).
  5. Author the sandbox profile (network egress, plus subtype- specific fs/exec/tty for CLI).
  6. Declare implements[] with mappings + per-tool overrides.
  7. Add region + policy_tags if relevant.
  8. Validate against ./resources/aip-30/draft/DRIVER.schema.json.

The agent MAY install the skill, follow the steps, and emit the final DRIVER.md (and optional driver.ts) without further instruction.

Example

---
name: OpenAI Images (HTTP)
id: openai-images-http
description:
  Image generation, edit, and variation via the OpenAI HTTP API. Implements
  three tools — image.create, image.edit, image.variation — sharing one
  API key, one rate limiter, and one egress allowlist.
version: 1.0.0
kind: http
auth:
  ref: ./SECRETS.md
  state:
    env: ["OPENAI_API_KEY"]
  expiry:
    detect: "http_status:401"
network:
  egress: ["api.openai.com"]
region: ["global"]
policy_tags: ["third-party-llm", "us-data-residency"]
implements:
  - tool: ./tools/image-create/TOOL.md
    version: "^1.0.0"
    schema_narrowing:
      drop_inputs: [seed, negative_prompt]
    mapping:
      prompt: prompt
      aspect: { from: aspect_ratio, transform: aspect_to_size }
    cost_override:
      cost_units_per_call: 4         # millicents (DALL-E 3 standard)
    metadata:
      http:
        endpoint: "/v1/images/generations"
        method: POST
  - tool: ./tools/image-edit/TOOL.md
    version: "^1.0.0"
    cost_override:
      cost_units_per_call: 6
    metadata:
      http:
        endpoint: "/v1/images/edits"
        method: POST
  - tool: ./tools/image-variation/TOOL.md
    version: "^1.0.0"
    cost_override:
      cost_units_per_call: 4
    metadata:
      http:
        endpoint: "/v1/images/variations"
        method: POST
health_check:
  method: http
  http: { method: GET, url: "https://api.openai.com/v1/models", expect_status: 200 }
  every: "PT5M"
tags: [openai, image-generation, third-party-api]
examples:
  - { goal: "create a 1024×1024 image", note: "uses DALL-E 3 default" }
  - { goal: "edit existing image with mask", note: "image.edit tool" }
---

## When to reach for this driver

Use OpenAI for general image generation when DALL-E quality
suffices and `policy_tags` allow third-party LLMs. Prefer the
`replicate-flux-http` driver for photorealistic style (better on
that benchmark) or the `gemini-imagen-http` driver for stylised
art (cheaper, comparable quality).

## Trade-offs

- **Cost**: $0.04/standard image — middle of the pack. Replicate
  Flux is $0.025; Stable Diffusion XL self-hosted is $0.005 +
  hardware amortisation.
- **Region**: US data residency only. Workspaces tagged
  `eu-data-residency` MUST exclude this driver.
- **Latency**: ~3–8s for 1024×1024. SDXL local is faster on warm
  GPU; Flux is comparable.
- **Reliability**: very high (OpenAI's status is the floor).

## Gotchas

- `seed` and `negative_prompt` not supported — declared in
  `schema_narrowing.drop_inputs`. Calls using these inputs will be
  refused by the resolver, not silently ignored.
- Rate limit: 50 RPM on the standard tier. Driver configures
  retry with exponential backoff; callers should NOT add their own.
- Expiry detection on `http_status:401`. After detected expiry,
  state transitions to `unauthed`; the host surfaces the login
  flow (re-bind the API key, this driver doesn't OAuth).

Compatibility

With AIP-14 TOOL.md (revised)

DRIVER.md slots in below the abstract TOOL.md. The TOOL no longer carries code/run/runner/secrets/network (those moved here). A TOOL referencing this DRIVER's implements[] entry sees its inputs validated, its mutates respected, its approval enforced — same contract as before, just with the implementation hosted here.

With AIP-29 CLI.md

CLI.md is a kind: cli specialisation of DRIVER. Its bin/bin_args/sandbox.fs/exec/tty/output blocks live in the CLI specialisation; everything else (install, version_check, auth, runner, network, implements) lives at the DRIVER level. The existing AIP-29 spec adds a driver_kind: cli frontmatter declaration to make the relationship explicit.

defineCli() survives as thin sugar over defineDriver({ kind: "cli", ... }).

With AIP-19 SECRETS.md

The driver's auth.ref points at a SECRETS.md listing the driver's required env-var bindings. Tools no longer carry a secrets: block — they inherit from the resolved driver.

With AIP-17 RUNNER.md

Driver's runner block (per AIP-17) declares the process boundary shared across every tool the driver implements. CLI drivers typically declare subprocess; future containerised drivers declare docker/firecracker.

With AIP-28 INTENT.md

INTENT routes to TOOL (abstract). The runtime resolves driver per call. INTENT.md doesn't reference drivers directly — its implements: block points at TOOLs. Direct driver pinning happens at call time via context.pinnedProvider, not at intent authoring time (escape hatch only).

Security considerations

DRIVER.md is declarative: a malicious manifest can lie about its network, sandbox, policy_tags, or auth. Hosts MUST treat the manifest as untrusted input until verified — minimum:

  • Verify install SHA-256 (when present, especially for curl and download methods).
  • Validate the binary's actual version against version_check.range AFTER install, BEFORE invocation.
  • Enforce the declared sandbox at the OS / network level. A driver that declares network.egress: ["api.foo.com"] and attempts to dial api.bar.com MUST fail closed.

Browser-callback login flows (auth.login.requires_callback_url: true) expose the host. Hosts MUST allocate a single-use, time-bound callback URL and reject inbound requests from unexpected origins.

policy_tags are declarative. Hosts MUST enforce policy filters (workspace allowlists, region constraints) against them; they MUST NOT trust the driver's self-tagging without verification at registration time.

implements[].schema_narrowing is a security-relevant contract: a driver that drops the idempotency_key input from a payments TOOL silently turns retry-safe calls into double-charge opportunities. Reviewers MUST read narrowing diffs as carefully as schema diffs.

Open questions

These remain open until enough subtype implementations ship.

  1. Streaming and progressive outputs. v1 contracts are unary in/out. Drivers that natively stream (HTTP SSE, gRPC bidi, MCP notifications/progress) need a v2 mechanism. Candidate: streaming: { in?: false, out?: "events" | "tokens" | "partials" } on implements[] entries, plus an executeStream body in defineDriver. Defer to AIP-30 v2.

  2. Multi-account auth (one driver, two accounts). Today's auth state is per (driver.id, workspace.id, user.id). When a user has two OpenAI accounts (personal + workplace), the resolver can't distinguish; workaround is registering two DRIVER.md files (openai-personal-http, openai-work-http). v2 candidate: auth.accounts: object[] + per-call account selection. Document workaround now, formalise later.

  3. Per-call cost variability. cost_units_per_call assumes a fixed cost. Drivers whose cost varies with input size (LLM token counts, ffmpeg duration) need a function. Candidate: cost_estimate: (input) => units in defineDriver. Defer to v2.

  4. Health-check sufficiency. A 200 from /v1/models doesn't guarantee /v1/images/generations works. Per-tool health checks? Synthetic-call canaries? Defer to v2 with field-experience input.

  5. Cross-kind composition. A driver that's "CLI on macOS, Docker on Linux" — single DRIVER.md with conditional kind, or two DRIVER.md files with requires.os filtering? v1 prefers the latter (simpler); v2 may revisit.

Composition pattern (inline | ref | file)

Like every composable block in the AIP series (RUNNER, STORAGE, SANDBOX, SECRETS, IDENTITY-ref, CODE), a driver reference accepts three forms when consumed by other manifests (e.g. inside a CODE.md that wants to expose its files via a typed binding):

# Inline — driver definition embedded in the parent
driver:
  inline:
    kind: mcp
    implements: [{ tool: "./tools/foo/TOOL.md", version: "^1.0" }]

# Ref — registry-resolvable identifier
driver:
  ref: "@agentik/drivers/openai-image-create"

# File — workspace-relative path to a sibling DRIVER.md
driver:
  file: "./DRIVER.md"

A consumer MAY use any of the three. Validators MUST accept all three and resolve at use-time. Inline and the standalone DRIVER.md file form share the same frontmatter schema (this AIP).

See also

Resources

Supporting artifacts for AIP-30. Links open the file on GitHub — markdown and JSON render natively in GitHub's viewer. Browse the full resource tree →