πŸ“œ

Build AI agents as foldersβ€” not code.

A filesystem-native framework. An agent is a directory of markdown, the LLM does the work, a thin ~400-LOC CLI coordinates. No engine, no database, no vendor lock-in β€” one definition runs on Claude, Codex, or Gemini, locally or in the cloud.

$npm i -g github:LeHungViet/scroll
Get started in 2 min ⬇ .tgz GitHub β†—
MITNode β‰₯ 20zero-dependency Claude Β· OpenAI/Codex Β· Geminieval: 26/26 gates

The bet

Most frameworks ship an engine. SCROLL ships a convention.

Heavy agent frameworks make you import a runtime (graphs, schedulers, vector DBs) and lock you to one model vendor. SCROLL bets the opposite way β€” and the 2026 evidence backs it.

Heavy frameworksSCROLL
An agent is…an object in codea folder of markdown
State lives in…a database / sessionthe filesystem + git
Orchestrationthe framework runtimea declarative DAG you advance deterministically
Modellocked to one vendorvendor-neutral β€” one definition, any runtime
The whole frameworka dependency treea convention + one ~400-LOC CLI

Why a folder

Why split it up β€” isn't a prompt enough?

Fair question. A SCROLL agent is the same shape as a Claude sub-agent β€” markdown + frontmatter. The split into IDENTITY Β· SOUL Β· TOOLS Β· hard-rules exists only to buy three things a single blob can't:

πŸ”Œ Portability (the real reason)

A plain sub-agent runs only in Claude Code. Separating the machine-readable IDENTITY (model, capabilities, runtimes) from the prose SOUL is what lets one definition transpile to Cowork, Codex, Gemini, and A2A β€” with no rewrite.

πŸ” Auditable & gradeable

hard-rules as a versioned, shareable file and evals/ as gold cases are what let scroll audit enforce and scroll eval grade. A single blob can't be checked.

🧰 Tools wired per runtime

Logical capabilities vs per-runtime tool_bindings: the same web.search maps to a different real tool on each host. A hardcoded prompt can't.

Is it over-engineering? Yes β€” if you run one agent, on one runtime, with no governance, a single prompt is better (SCROLL even has a lean single-call path for that). SCROLL earns its keep only at portability + governance + multi-agent at scale: many agents, many runtimes, kept consistent, audited, and graded. Use it when that's your problem β€” not before.

How it works

An agent is exactly these files.

Scaffold a compliant agent, write its soul in plain language, and build it once β€” one source renders to every runtime. No engine to wire, no model SDK to import.

Create the agent. scroll new atlas scaffolds a correct folder β€” you just fill it in.
Write the soul. Describe the agent in plain prose in SOUL.md; fill a few frontmatter lines in IDENTITY.md (name, model, capabilities).
Build & audit. scroll build renders to every runtime; scroll audit confirms it's compliant.
Run it. scroll run drives a real loop β€” cost caps, checkpoints, a full event log.
# scaffold β†’ write β†’ render, audit, run
scroll new atlas
scroll build atlas      # β†’ Cowork Β· Codex Β· Gemini Β· Claude-subagent Β· A2A
scroll audit           # compliance gate (CI)
scroll run atlas --task "…"

agents/atlas/
β”œβ”€β”€ IDENTITY.md   # who it is (machine-readable frontmatter)
β”œβ”€β”€ SOUL.md       # how it thinks (you write this)
β”œβ”€β”€ TOOLS.md      # what it can use
β”œβ”€β”€ hard-rules.md # rules it must never break
β”œβ”€β”€ memory/       # what it remembers
└── evals/        # how you grade it

πŸ“ One source β†’ every runtime

The same folder transpiles to a Cowork role, a Codex/Gemini prompt, a Claude sub-agent, or an A2A card. Fix one place, rebuild, no drift.

🧩 State is files + git

No DB, no session store. Runs are folders; checkpoints are commits; a crash resumes by re-reading the directory.

βš™οΈ The LLM works inside a step

A deterministic runner advances the task DAG; the model only acts within a step. The reliability comes from the structure, not luck.

Install

Up and running in 2 minutes.

Needs Node β‰₯ 20. Pulls in no other dependencies.

# Option 1 β€” install straight from GitHub (public repo, works today)
npm i -g github:LeHungViet/scroll

# Option 2 β€” install the downloaded .tgz (no accounts needed)
npm i -g ./agentpro-scroll-0.6.0.tgz

# Option 3 β€” once published: npm i -g @agentpro/scroll

# Try it
scroll new researcher
scroll check researcher
scroll audit
scroll run researcher --task "Summarize topic X into one short conclusion"

What you get

A compliant agents/researcher/ folder, a hash-bound .scroll/audit.json report, and an agent that just ran for real with a per-step log.

From that one folder, scroll build renders prompts for Cowork, Codex, Gemini, or an A2A card β€” no rewrite.

⬇ Download agentpro-scroll-0.6.0.tgz

The commands

Each command does one clear job.

The whole framework is one thin CLI. Here's what each command is for:

scroll new <name>Scaffold a compliant agent folder so you don't have to memorize the structure.
scroll check <name>Validate an agent's structure (valid frontmatter, required files). Runs in pre-commit / CI.
scroll auditDeep compliance scan: catches prompts left in code, hardcoded models, banned infra, agents shipped without evals β€” then writes a hash-bound report.
scroll build <name>Render one agent to every runtime: Cowork Β· Codex Β· Gemini Β· Claude sub-agent Β· A2A card.
scroll run --work WORK.mdDrive a real multi-agent workflow. (Single job: scroll run <agent> --task "…".)
scroll cost "<task>"Estimate single- vs multi-agent token cost before you spawn anything.
scroll registryScan all agents into one overview (model, tools, runtimes).

Multi-agent, the safe way

Coordinate through files β€” not a message bus.

πŸ“‹ One controller owns WORK.md

The task chain is a declarative DAG with a single owner β€” the fix for the #1 multi-agent failure: agents handing off in infinite loops.

πŸ—’οΈ Blackboard, not telephone

Agents post findings to append-only files; others read what's relevant. A handoff is a file-state change, passing a summary β€” not the whole history.

πŸ’° Cost gate before spawning

Default is single-agent. SCROLL estimates single-vs-multi cost before it fans out, and prefers single when it's enough.

Examples you can build today

πŸ“Š Decision memo

One agent researches the market, one reads the financials (in parallel); the controller merges them into a go/no-go memo.

πŸ› οΈ Code-review crew

One agent reads the diff, one runs the tests; the controller blocks the merge until format and findings are clean.

πŸ“š Long research run

Six chained steps (search β†’ plan β†’ build β†’ verify β†’ revise β†’ report). Interrupt it and it resumes from the checkpoint.

# WORK.md β€” "decision memo" example (trimmed)
controller: lead

[task] id: market-research   owner: researcher   parallel: true
       objective: Summarize market size, growth, key players.

[task] id: financials        owner: analyst      parallel: true
       objective: Summarize unit economics and cost risks.

[task] id: synthesize        owner: lead         blockedBy: [market-research, financials]
       objective: Merge into a go/no-go memo.   final: true
Long, unattended runs survive restarts: hard caps + a circuit-breaker stop runaway spend, human gates hold irreversible actions for approval, and every run emits an append-only events.jsonl you can watch and steer.

Positioning

Not a rival to the model SDKs β€” it sits above them.

The Claude Agent SDK, OpenAI Agents SDK, and Google ADK are runtimes bound to one vendor. SCROLL is a vendor-neutral convention that transpiles down to them β€” your agents aren't hostage to any one provider.

DimensionVendor agent SDKsSCROLL
What it isan engine (code / library)a convention + thin CLI, sitting above
An agent iscode / framework objectsmarkdown prose in a folder
Orchestrationthe SDK runtimea declarative DAG on the filesystem
Vendorlocked to oneneutral β€” one definition, any runtime
It is…a destination (lock-in)a router (freedom)
SCROLL emits a Claude sub-agent, an OpenAI/Codex prompt, a Gemini prompt, or an A2A card β€” so you can still deploy on a vendor's runtime when you want. They build SDKs to be the destination; SCROLL is the router that keeps you free to move.

The token niche

Cheap by construction β€” even through a CLI.

SCROLL's cost wins come from context engineering and the filesystem, not a vector store. A stack of levers, most of which work whether you call the model via API or a CLI:

⚑ Skip the call entirely

Mechanical steps (merge, format) run in code as deterministic, 0-token tasks β€” the biggest lever, because every avoided call avoids the model's whole system-prompt cost.

♻️ Cache the stable prefix

Context is ordered prefix-first so the agent definition caches; cached reads bill at ~0.1Γ—.

βœ‚οΈ Output discipline

Output is billed ~5Γ— input β€” sub-tasks return terse findings; only the synthesis writes full prose.

🎯 Route + trim + lean

Cheap model for sub-tasks, strong for synthesis; trimmed intermediate output; a lean prefix for one-shot tasks.

On the multi-agent and long-run cases where its mechanisms apply, SCROLL cut billable tokens 50–88% and latency 50–75% versus an unstructured baseline β€” at equal-or-better output quality. (Per-case gated A/B eval; trivial single-call tasks are near-parity by design β€” SCROLL doesn't add ceremony where it can't help.)

Compliance

The correct path is the only one that works.

A framework is worthless if people β€” and AI coders β€” don't follow it. SCROLL makes compliance structural, then verifies it instead of trusting a claim.

πŸ—οΈ Scaffold β†’ schema β†’ linter β†’ build

scroll new starts you compliant; a JSON Schema flags frontmatter as you type; scroll check runs in CI; scroll build refuses a broken agent.

πŸ€– AI coders follow it by default

A shipped AGENTS.md teaches Claude Code / Cursor / Cowork the rules from your repo β€” compliant agents without memorizing docs.

Verified, not trusted. AI coders improvise and fabricate. scroll audit flags the deviations β€” persona/prompts left in code, hardcoded models, banned infra (DBs/queues/graph frameworks), agents shipped without evals β€” and writes a hash-bound report. A "pass" is tied to the exact file state, so an assistant can't fake "I followed SCROLL," and editing files after the audit invalidates it (scroll audit --verify). The verdict comes from CI β€” not the agent's word.

Status β€” honest

Proven where it counts; clear about the rest.

26/26
objective gates passed Β· 3 repeats
50–88%
fewer tokens on gated cases
4.94 / 4.69
blind quality β€” SCROLL vs baseline
TOKEN REDUCTION vs unstructured baseline β€” cases where SCROLL's mechanisms apply
Multi-agent scan 88% Cost-gate negative 78% Handoff drift trap 51% Tool security gate 43% Agent portability 40%
Per-case gated A/B eval Β· Codex CLI Β· 3 repeats. Trivial single-call tasks are near-parity by design β€” SCROLL adds no ceremony where it can't help.

A full case-gated live A/B eval (3 repeats) passes 26/26 gates: token, speed, universal compliance, quality non-regression, checkpoint-resume, HITL approval, drift-free portability, and the cost gate. Reliability is engineered β€” caps, checkpoints, verify-before-done, eval by end-state β€” not hoped for.

What runs today

newcheckaudit buildrunregistrycost resumeHITLcost-gatecaching

Roadmap

eval gradermcp / skill / plugin crash-resume hardeningAG-UI / OTelstorage adapters

What the field is finding

Not a hunch β€” where the field is already heading.

SCROLL's bets line up with what credible voices and adoption data already show. These are findings about the approach, cited and linkable β€” not endorsements of SCROLL.

β€œThe most successful implementations weren't using complex frameworks… they were building with simple, composable patterns.”

Anthropic β€” Building Effective Agents β†—

Multi-agent swarms are fragile from poor context sharing; the pattern that works = many agents contribute, writes stay single-threaded.

Cognition (Walden Yan) β€” on multi-agents β†—

60,000+ projects use AGENTS.md β€” agents-as-markdown is now an open standard under the Linux Foundation's Agentic AI Foundation.

Linux Foundation β€” Agentic AI Foundation β†—

SCROLL turns these findings into a tool: simple over framework (a folder + ~400-LOC CLI), single controller, writes single-threaded (one owner of WORK.md), and agents-as-markdown (built on the AGENTS.md convention it ships). Its own efficiency claims are then measured by a per-case A/B eval β€” not asserted.

Turn the AI subscriptions you already pay for into portable, governed agents.

$npm i -g github:LeHungViet/scroll
Star on GitHub β†—