Build AI agents as foldersβ not code.
A filesystem-native framework. An agent is a directory of markdown, the LLM does the work, a thin ~400-LOC CLI coordinates. No engine, no database, no vendor lock-in β one definition runs on Claude, Codex, or Gemini, locally or in the cloud.
The bet
Most frameworks ship an engine. SCROLL ships a convention.
Heavy agent frameworks make you import a runtime (graphs, schedulers, vector DBs) and lock you to one model vendor. SCROLL bets the opposite way β and the 2026 evidence backs it.
| Heavy frameworks | SCROLL | |
|---|---|---|
| An agent is⦠| an object in code | a folder of markdown |
| State lives in⦠| a database / session | the filesystem + git |
| Orchestration | the framework runtime | a declarative DAG you advance deterministically |
| Model | locked to one vendor | vendor-neutral β one definition, any runtime |
| The whole framework | a dependency tree | a convention + one ~400-LOC CLI |
Why a folder
Why split it up β isn't a prompt enough?
Fair question. A SCROLL agent is the same shape as a Claude sub-agent β markdown + frontmatter. The split into IDENTITY Β· SOUL Β· TOOLS Β· hard-rules exists only to buy three things a single blob can't:
π Portability (the real reason)
A plain sub-agent runs only in Claude Code. Separating the machine-readable IDENTITY (model, capabilities, runtimes) from the prose SOUL is what lets one definition transpile to Cowork, Codex, Gemini, and A2A β with no rewrite.
π Auditable & gradeable
hard-rules as a versioned, shareable file and evals/ as gold cases are what let scroll audit enforce and scroll eval grade. A single blob can't be checked.
π§° Tools wired per runtime
Logical capabilities vs per-runtime tool_bindings: the same web.search maps to a different real tool on each host. A hardcoded prompt can't.
How it works
An agent is exactly these files.
Scaffold a compliant agent, write its soul in plain language, and build it once β one source renders to every runtime. No engine to wire, no model SDK to import.
# scaffold β write β render, audit, run scroll new atlas scroll build atlas # β Cowork Β· Codex Β· Gemini Β· Claude-subagent Β· A2A scroll audit # compliance gate (CI) scroll run atlas --task "β¦" agents/atlas/ βββ IDENTITY.md # who it is (machine-readable frontmatter) βββ SOUL.md # how it thinks (you write this) βββ TOOLS.md # what it can use βββ hard-rules.md # rules it must never break βββ memory/ # what it remembers βββ evals/ # how you grade it
π One source β every runtime
The same folder transpiles to a Cowork role, a Codex/Gemini prompt, a Claude sub-agent, or an A2A card. Fix one place, rebuild, no drift.
π§© State is files + git
No DB, no session store. Runs are folders; checkpoints are commits; a crash resumes by re-reading the directory.
βοΈ The LLM works inside a step
A deterministic runner advances the task DAG; the model only acts within a step. The reliability comes from the structure, not luck.
Install
Up and running in 2 minutes.
Needs Node β₯ 20. Pulls in no other dependencies.
# Option 1 β install straight from GitHub (public repo, works today) npm i -g github:LeHungViet/scroll # Option 2 β install the downloaded .tgz (no accounts needed) npm i -g ./agentpro-scroll-0.6.0.tgz # Option 3 β once published: npm i -g @agentpro/scroll # Try it scroll new researcher scroll check researcher scroll audit scroll run researcher --task "Summarize topic X into one short conclusion"
What you get
A compliant agents/researcher/ folder, a hash-bound .scroll/audit.json report, and an agent that just ran for real with a per-step log.
From that one folder, scroll build renders prompts for Cowork, Codex, Gemini, or an A2A card β no rewrite.
β¬ Download agentpro-scroll-0.6.0.tgzThe commands
Each command does one clear job.
The whole framework is one thin CLI. Here's what each command is for:
| scroll new <name> | Scaffold a compliant agent folder so you don't have to memorize the structure. |
| scroll check <name> | Validate an agent's structure (valid frontmatter, required files). Runs in pre-commit / CI. |
| scroll audit | Deep compliance scan: catches prompts left in code, hardcoded models, banned infra, agents shipped without evals β then writes a hash-bound report. |
| scroll build <name> | Render one agent to every runtime: Cowork Β· Codex Β· Gemini Β· Claude sub-agent Β· A2A card. |
| scroll run --work WORK.md | Drive a real multi-agent workflow. (Single job: scroll run <agent> --task "β¦".) |
| scroll cost "<task>" | Estimate single- vs multi-agent token cost before you spawn anything. |
| scroll registry | Scan all agents into one overview (model, tools, runtimes). |
Multi-agent, the safe way
Coordinate through files β not a message bus.
π One controller owns WORK.md
The task chain is a declarative DAG with a single owner β the fix for the #1 multi-agent failure: agents handing off in infinite loops.
ποΈ Blackboard, not telephone
Agents post findings to append-only files; others read what's relevant. A handoff is a file-state change, passing a summary β not the whole history.
π° Cost gate before spawning
Default is single-agent. SCROLL estimates single-vs-multi cost before it fans out, and prefers single when it's enough.
Examples you can build today
π Decision memo
One agent researches the market, one reads the financials (in parallel); the controller merges them into a go/no-go memo.
π οΈ Code-review crew
One agent reads the diff, one runs the tests; the controller blocks the merge until format and findings are clean.
π Long research run
Six chained steps (search β plan β build β verify β revise β report). Interrupt it and it resumes from the checkpoint.
# WORK.md β "decision memo" example (trimmed) controller: lead [task] id: market-research owner: researcher parallel: true objective: Summarize market size, growth, key players. [task] id: financials owner: analyst parallel: true objective: Summarize unit economics and cost risks. [task] id: synthesize owner: lead blockedBy: [market-research, financials] objective: Merge into a go/no-go memo. final: true
Positioning
Not a rival to the model SDKs β it sits above them.
The Claude Agent SDK, OpenAI Agents SDK, and Google ADK are runtimes bound to one vendor. SCROLL is a vendor-neutral convention that transpiles down to them β your agents aren't hostage to any one provider.
| Dimension | Vendor agent SDKs | SCROLL |
|---|---|---|
| What it is | an engine (code / library) | a convention + thin CLI, sitting above |
| An agent is | code / framework objects | markdown prose in a folder |
| Orchestration | the SDK runtime | a declarative DAG on the filesystem |
| Vendor | locked to one | neutral β one definition, any runtime |
| It is⦠| a destination (lock-in) | a router (freedom) |
The token niche
Cheap by construction β even through a CLI.
SCROLL's cost wins come from context engineering and the filesystem, not a vector store. A stack of levers, most of which work whether you call the model via API or a CLI:
β‘ Skip the call entirely
Mechanical steps (merge, format) run in code as deterministic, 0-token tasks β the biggest lever, because every avoided call avoids the model's whole system-prompt cost.
β»οΈ Cache the stable prefix
Context is ordered prefix-first so the agent definition caches; cached reads bill at ~0.1Γ.
βοΈ Output discipline
Output is billed ~5Γ input β sub-tasks return terse findings; only the synthesis writes full prose.
π― Route + trim + lean
Cheap model for sub-tasks, strong for synthesis; trimmed intermediate output; a lean prefix for one-shot tasks.
Compliance
The correct path is the only one that works.
A framework is worthless if people β and AI coders β don't follow it. SCROLL makes compliance structural, then verifies it instead of trusting a claim.
ποΈ Scaffold β schema β linter β build
scroll new starts you compliant; a JSON Schema flags frontmatter as you type; scroll check runs in CI; scroll build refuses a broken agent.
π€ AI coders follow it by default
A shipped AGENTS.md teaches Claude Code / Cursor / Cowork the rules from your repo β compliant agents without memorizing docs.
Status β honest
Proven where it counts; clear about the rest.
A full case-gated live A/B eval (3 repeats) passes 26/26 gates: token, speed, universal compliance, quality non-regression, checkpoint-resume, HITL approval, drift-free portability, and the cost gate. Reliability is engineered β caps, checkpoints, verify-before-done, eval by end-state β not hoped for.
What runs today
Roadmap
What the field is finding
Not a hunch β where the field is already heading.
SCROLL's bets line up with what credible voices and adoption data already show. These are findings about the approach, cited and linkable β not endorsements of SCROLL.
βThe most successful implementations weren't using complex frameworksβ¦ they were building with simple, composable patterns.β
Multi-agent swarms are fragile from poor context sharing; the pattern that works = many agents contribute, writes stay single-threaded.
60,000+ projects use AGENTS.md β agents-as-markdown is now an open standard under the Linux Foundation's Agentic AI Foundation.