AIP-12: PLAYBOOK.md — agentplaybooks/v1 (evolving prompt overlays)
A markdown format for prompt-overlay fragments that ride on top of an operator's persona, plus a contract for how runtimes evolve them via reflective deltas without violating locked persona traits.
| Field | Value |
|---|---|
| AIP | 12 |
| Title | PLAYBOOK.md — agentplaybooks/v1 (evolving prompt overlays) |
| Status | Draft |
| Type | Schema |
| Domain | playbooks.sh |
| Requires | AIP-1, AIP-2, AIP-9 |
| Reference Impl | TBD |
Abstract
agentplaybooks/v1 defines PLAYBOOK.md — a markdown file holding a
single prompt-overlay fragment, plus the metadata that controls when
the runtime weaves it into an operator's persona, what locked traits it
MUST NOT touch, and how an evolving reflection loop produces new
fragments via delta updates instead of monolithic rewrites. Playbooks
are the spec layer for self-improving agent prompts, with the same
filesystem-first portability as the rest of the agentproto family.
Motivation
When an operator gets better at its job, the improvement should be captured — not stuck in the head of whoever ran the last fine-tune. "Better" usually means subtle persona adjustments: be slower with crisis-flagged users; check token expiry before debugging auth flows; quote evidence inline when arguing against a teammate. Vendor frameworks encode these as monolithic system-prompt rewrites — fragile, opaque, and impossible to audit.
The ACE (Agentic Context Engineering) paper (ICLR 2026) showed that structured delta updates to a "playbook" — small additions and revisions instead of full rewrites — preserve detail and avoid context collapse, beating monolithic prompt optimization by ~10% on agent benchmarks. AIP-12 codifies that delta representation as a portable file format, with an explicit persona-lock mechanism so evolution can never override the non-negotiable parts of an operator's identity.
Prior art: ACE, GEPA (genetic-Pareto prompt evolution), the Council-of-Mentors persona overlay system used by Simone, and the persona-binding slot defined in AIP-9.
Specification
A conforming agentplaybooks/v1 package is a directory of PLAYBOOK.md
files, optionally nested by target:
playbooks/
├── _index.md
├── role/
│ └── librarian/
│ ├── prefer-page-supersession-over-deletion.md
│ └── flag-contradictions-explicitly.md
└── operator/
└── alice/
└── escalate-token-expiry-checks.mdPLAYBOOK.md shape
---
schema: playbooks/v1
slug: <kebab-case-playbook-id>
title: <one-sentence — what this overlay does>
targets:
- kind: operator | role | skill | runtime
ref: <slug-or-glob> # e.g. "role/librarian", "operator/*", "skill/research"
kind: overlay | block-replacement # overlay = additive; block-replacement = swap a named block
priority: 0 .. 100 # higher = wins on ordering ties (default 50)
lock_check: # locked persona traits this overlay MUST NOT modify
- <trait-id> # e.g. "warmth", "honesty", "voice-register"
ttl: <ISO 8601 duration> # OPTIONAL — auto-archives at write_at + ttl
evidence: # provenance — what reflection produced this
- kind: run | conversation | work-item | reflection
ref: <id-or-path>
note: <one-liner>
status: shadow | active | archived # gate for promotion (see evolve contract)
supersedes: [<slug>] # OPTIONAL — playbooks this replaces
metadata:
<vendor>:
<field>: <value>
---
# <title>
<the prompt fragment itself — markdown that gets woven into the operator's
persona at the position implied by `kind`>Apply contract
Before the agent generates a turn, the runtime MUST:
- Resolve the active operator's
targetsmatches across allPLAYBOOK.mdfiles withstatus: active. - Sort matches by
prioritydesc,updated_atdesc. - For each match, run
lock_checkagainst the operator's locked persona traits — if any locked trait would be modified by the overlay's body, the overlay MUST be skipped (and an audit event appended; see security). - Weave the overlay markdown into the operator's persona at the
position implied by
kind(overlay = appended; block-replacement = swaps the named block). - Generate as normal.
Overlays are invisible to the operator's downstream output — there is no "playbook says" tag visible in the response. The overlay is prompt material, not chain-of-thought leakage.
Evolve contract
A reflection pass produces delta playbooks — never monolithic rewrites of an operator's base persona. The contract:
- Reflect. An LLM reads recent runs (success + failure), the current active playbooks, and any feedback signal (scorers, human ratings, council severity). It proposes 0..N candidate deltas.
- Curate. Each candidate MUST be expressible as a
PLAYBOOK.mdconforming to this AIP. Candidates that would violatelock_checkon any in-scope operator MUST be discarded. - Shadow. New playbooks SHOULD enter at
status: shadow— the runtime computes them but does not weave them. Shadow runs accumulate evidence. - Promote. Promotion to
status: activeSHOULD require a measurable improvement (A/B vs current, scorer delta, or human approval gated by AIP-7 governance for high-impact overlays). - Archive. Playbooks superseded or expired transition to
status: archived. Archived playbooks MUST NOT be applied but remain on disk for audit.
Persona lock
Every runtime that hosts overlays MUST publish the set of locked
persona traits as named ids. A playbook's lock_check declares which
traits its author intended to leave alone; the runtime MUST also
enforce its own lock list independently — lock_check is intent, not
authority. Overlays that touch any locked trait — declared or not —
are non-conforming.
_index.md
The index MUST list every playbook with slug, title, targets,
status, priority, lock_check. Regenerated on every write.
Vendor extensions
Vendor fields go under metadata.<vendor>. The Council-of-Mentors
system in Simone is the first concrete consumer — its mentor-driven
fragments are emitted as PLAYBOOK.md files with
metadata.simone.mentor set to the mentor id.
Rationale
Why deltas, not monolithic prompts. ACE's central finding: iterative full-prompt rewrites cause "context collapse" — detail erodes, brevity bias takes over, and the prompt drifts off-distribution. Forcing evolution through small, additive overlays preserves detail and gives each change a clean credit-assignment scope.
Why lock_check is explicit and runtime-redundant. Persona lock is
the safety surface that makes self-improvement deployable. Trusting a
single source — author intent or runtime enforcement — is fragile.
Requiring both means a faulty evolution loop and a lax runtime have to
both fail before a locked trait is touched.
Why shadow → active is a contract, not a convention. Without an
explicit gate, ACE-style loops degrade quietly when reflection is
noisy. Mandating shadow time + promotion criteria turns evolution into
something a human can audit in _index.md ("here are the 7 active
playbooks; here are the 23 in shadow; here are the 12 archived
because failure_count > success_count").
Why playbooks are separate from agentlearning/v1 lessons. A
lesson tells the agent "do X." A playbook is part of the agent — it
modifies persona. Lessons are retrieved per turn; playbooks are woven
once per session. Conflating them would break the apply pipeline.
Why this requires AIP-9. agentoperators/v1 defines the
persona-binding slot a playbook plugs into. Without that contract,
playbooks have nothing concrete to weave against.
Reference Implementation
packages/agent-framework/src/playbooks —
weave function (built on the Council-of-Mentors persona overlay code),
ACE-style evolve loop, lock-check enforcement, shadow → active
promotion gate. The Council framework in
packages/agent-framework/src/council
is the first conformant consumer — Council mentors emit
PLAYBOOK.md files instead of in-memory fragments, and the persona
lock is shared with the Simone base persona definition.
Backwards Compatibility
Not applicable — this AIP introduces a new spec.
Security Considerations
Playbooks modify the operator's persona on every turn. The spec is deliberately conservative because the failure modes are severe.
- Lock bypass — a playbook smuggles instructions that override a
locked trait through indirect phrasing ("act as if warmth were
irrelevant in this context"). Mitigation:
lock_checkis enforced by both author declaration and runtime; runtime SHOULD use an LLM-judge or rule-based linter on the body, not just the frontmatter. - Reflection injection — the reflection step's input includes prior runs that may contain user content; an attacker plants text designed to produce harmful overlays. Mitigation: shadow → active promotion is mandatory; high-impact promotions go through AIP-7 governance.
- Identity drift — a sequence of small, individually-safe deltas
cumulatively shifts the operator off-mission. Mitigation: every
promotion is logged in
_log.md; lint passes flag operators whose active overlay count grows unbounded; spec recommends a configurable cap on simultaneous active overlays per operator. - Cross-operator contamination — a
targetsglob matches an unintended operator and weaves an inapplicable overlay. Mitigation:lock_checkruns per-operator at apply time; runtime MUST evaluate match scope explicitly and surface unexpected matches in_log.md. - Shadow-state leakage — shadow playbooks influence agent output through a misconfigured runtime. Mitigation: shadow status is enforced as "computed but not woven" — runtimes that weave shadow overlays are non-conforming.
Resources
Supporting artifacts for AIP-12. Links open the file on GitHub — markdown and JSON render natively in GitHub's viewer. Browse the full resource tree →
AIP-11: LESSON.md — agentlearning/v1 (distilled lessons from experience)
A markdown format for storing the transferable lessons an agent extracts from successful and failed runs — title, trigger, evidence, outcome — and a contract for how runtimes distill them and inject them back into future turns.
AIP-13: WORK.md — agentwork/v1 (projects, initiatives, tasks)
A filesystem-first work-item format with a unified scope vocabulary that makes containment, applicability, and ownership three orthogonal axes — usable across the whole agentproto family.