📜

Build AI agents as folders— not code.

A filesystem-native framework. An agent is a directory of markdown, the LLM does the work, a thin ~400-LOC CLI coordinates. No engine, no database, no vendor lock-in — one definition runs on Claude, Codex, or Gemini, locally or in the cloud.

$npm i -g github:LeHungViet/scroll

Get started in 2 min ⬇ .tgz GitHub ↗

MITNode ≥ 20zero-dependency Claude · OpenAI/Codex · Geminieval: 26/26 gates

The bet

Most frameworks ship an engine. SCROLL ships a convention.

Heavy agent frameworks make you import a runtime (graphs, schedulers, vector DBs) and lock you to one model vendor. SCROLL bets the opposite way — and the 2026 evidence backs it.

	Heavy frameworks	SCROLL
An agent is…	an object in code	a folder of markdown
State lives in…	a database / session	the filesystem + git
Orchestration	the framework runtime	a declarative DAG you advance deterministically
Model	locked to one vendor	vendor-neutral — one definition, any runtime
The whole framework	a dependency tree	a convention + one ~400-LOC CLI

Why a folder

Why split it up — isn't a prompt enough?

Fair question. A SCROLL agent is the same shape as a Claude sub-agent — markdown + frontmatter. The split into IDENTITY · SOUL · TOOLS · hard-rules exists only to buy three things a single blob can't:

🔌 Portability (the real reason)

A plain sub-agent runs only in Claude Code. Separating the machine-readable IDENTITY (model, capabilities, runtimes) from the prose SOUL is what lets one definition transpile to Cowork, Codex, Gemini, and A2A — with no rewrite.

🔍 Auditable & gradeable

hard-rules as a versioned, shareable file and evals/ as gold cases are what let scroll audit enforce and scroll eval grade. A single blob can't be checked.

🧰 Tools wired per runtime

Logical capabilities vs per-runtime tool_bindings: the same web.search maps to a different real tool on each host. A hardcoded prompt can't.

Is it over-engineering? Yes — if you run one agent, on one runtime, with no governance, a single prompt is better (SCROLL even has a lean single-call path for that). SCROLL earns its keep only at portability + governance + multi-agent at scale: many agents, many runtimes, kept consistent, audited, and graded. Use it when that's your problem — not before.

How it works

An agent is exactly these files.

Scaffold a compliant agent, write its soul in plain language, and build it once — one source renders to every runtime. No engine to wire, no model SDK to import.

Create the agent. scroll new atlas scaffolds a correct folder — you just fill it in.

Write the soul. Describe the agent in plain prose in SOUL.md; fill a few frontmatter lines in IDENTITY.md (name, model, capabilities).

Build & audit. scroll build renders to every runtime; scroll audit confirms it's compliant.

Run it. scroll run drives a real loop — cost caps, checkpoints, a full event log.

# scaffold → write → render, audit, run
scroll new atlas
scroll build atlas      # → Cowork · Codex · Gemini · Claude-subagent · A2A
scroll audit           # compliance gate (CI)
scroll run atlas --task "…"

agents/atlas/
├── IDENTITY.md   # who it is (machine-readable frontmatter)
├── SOUL.md       # how it thinks (you write this)
├── TOOLS.md      # what it can use
├── hard-rules.md # rules it must never break
├── memory/       # what it remembers
└── evals/        # how you grade it

📁 One source → every runtime

The same folder transpiles to a Cowork role, a Codex/Gemini prompt, a Claude sub-agent, or an A2A card. Fix one place, rebuild, no drift.

🧩 State is files + git

No DB, no session store. Runs are folders; checkpoints are commits; a crash resumes by re-reading the directory.

⚙️ The LLM works inside a step

A deterministic runner advances the task DAG; the model only acts within a step. The reliability comes from the structure, not luck.

Install

Up and running in 2 minutes.

Needs Node ≥ 20. Pulls in no other dependencies.

# Option 1 — install straight from GitHub (public repo, works today)
npm i -g github:LeHungViet/scroll

# Option 2 — install the downloaded .tgz (no accounts needed)
npm i -g ./agentpro-scroll-0.6.0.tgz

# Option 3 — once published: npm i -g @agentpro/scroll

# Try it
scroll new researcher
scroll check researcher
scroll audit
scroll run researcher --task "Summarize topic X into one short conclusion"

What you get

A compliant agents/researcher/ folder, a hash-bound .scroll/audit.json report, and an agent that just ran for real with a per-step log.

From that one folder, scroll build renders prompts for Cowork, Codex, Gemini, or an A2A card — no rewrite.

⬇ Download agentpro-scroll-0.6.0.tgz

The commands

Each command does one clear job.

The whole framework is one thin CLI. Here's what each command is for:

scroll new <name>	Scaffold a compliant agent folder so you don't have to memorize the structure.
scroll check <name>	Validate an agent's structure (valid frontmatter, required files). Runs in pre-commit / CI.
scroll audit	Deep compliance scan: catches prompts left in code, hardcoded models, banned infra, agents shipped without evals — then writes a hash-bound report.
scroll build <name>	Render one agent to every runtime: Cowork · Codex · Gemini · Claude sub-agent · A2A card.
scroll run --work WORK.md	Drive a real multi-agent workflow. (Single job: scroll run <agent> --task "…".)
scroll cost "<task>"	Estimate single- vs multi-agent token cost before you spawn anything.
scroll registry	Scan all agents into one overview (model, tools, runtimes).

Multi-agent, the safe way

Coordinate through files — not a message bus.

📋 One controller owns WORK.md

The task chain is a declarative DAG with a single owner — the fix for the #1 multi-agent failure: agents handing off in infinite loops.

🗒️ Blackboard, not telephone

Agents post findings to append-only files; others read what's relevant. A handoff is a file-state change, passing a summary — not the whole history.

💰 Cost gate before spawning

Default is single-agent. SCROLL estimates single-vs-multi cost before it fans out, and prefers single when it's enough.

Examples you can build today

📊 Decision memo

One agent researches the market, one reads the financials (in parallel); the controller merges them into a go/no-go memo.

🛠️ Code-review crew

One agent reads the diff, one runs the tests; the controller blocks the merge until format and findings are clean.

📚 Long research run

Six chained steps (search → plan → build → verify → revise → report). Interrupt it and it resumes from the checkpoint.

# WORK.md — "decision memo" example (trimmed)
controller: lead

[task] id: market-research   owner: researcher   parallel: true
       objective: Summarize market size, growth, key players.

[task] id: financials        owner: analyst      parallel: true
       objective: Summarize unit economics and cost risks.

[task] id: synthesize        owner: lead         blockedBy: [market-research, financials]
       objective: Merge into a go/no-go memo.   final: true

Long, unattended runs survive restarts: hard caps + a circuit-breaker stop runaway spend, human gates hold irreversible actions for approval, and every run emits an append-only events.jsonl you can watch and steer.

Positioning

Not a rival to the model SDKs — it sits above them.

The Claude Agent SDK, OpenAI Agents SDK, and Google ADK are runtimes bound to one vendor. SCROLL is a vendor-neutral convention that transpiles down to them — your agents aren't hostage to any one provider.

Dimension	Vendor agent SDKs	SCROLL
What it is	an engine (code / library)	a convention + thin CLI, sitting above
An agent is	code / framework objects	markdown prose in a folder
Orchestration	the SDK runtime	a declarative DAG on the filesystem
Vendor	locked to one	neutral — one definition, any runtime
It is…	a destination (lock-in)	a router (freedom)

SCROLL emits a Claude sub-agent, an OpenAI/Codex prompt, a Gemini prompt, or an A2A card — so you can still deploy on a vendor's runtime when you want. They build SDKs to be the destination; SCROLL is the router that keeps you free to move.

The token niche

Cheap by construction — even through a CLI.

SCROLL's cost wins come from context engineering and the filesystem, not a vector store. A stack of levers, most of which work whether you call the model via API or a CLI:

⚡ Skip the call entirely

Mechanical steps (merge, format) run in code as deterministic, 0-token tasks — the biggest lever, because every avoided call avoids the model's whole system-prompt cost.

♻️ Cache the stable prefix

Context is ordered prefix-first so the agent definition caches; cached reads bill at ~0.1×.

✂️ Output discipline

Output is billed ~5× input — sub-tasks return terse findings; only the synthesis writes full prose.

🎯 Route + trim + lean

Cheap model for sub-tasks, strong for synthesis; trimmed intermediate output; a lean prefix for one-shot tasks.

On the multi-agent and long-run cases where its mechanisms apply, SCROLL cut billable tokens 50–88% and latency 50–75% versus an unstructured baseline — at equal-or-better output quality. (Per-case gated A/B eval; trivial single-call tasks are near-parity by design — SCROLL doesn't add ceremony where it can't help.)

Compliance

The correct path is the only one that works.

A framework is worthless if people — and AI coders — don't follow it. SCROLL makes compliance structural, then verifies it instead of trusting a claim.

🏗️ Scaffold → schema → linter → build

scroll new starts you compliant; a JSON Schema flags frontmatter as you type; scroll check runs in CI; scroll build refuses a broken agent.

🤖 AI coders follow it by default

A shipped AGENTS.md teaches Claude Code / Cursor / Cowork the rules from your repo — compliant agents without memorizing docs.

Verified, not trusted. AI coders improvise and fabricate. scroll audit flags the deviations — persona/prompts left in code, hardcoded models, banned infra (DBs/queues/graph frameworks), agents shipped without evals — and writes a hash-bound report. A "pass" is tied to the exact file state, so an assistant can't fake "I followed SCROLL," and editing files after the audit invalidates it (scroll audit --verify). The verdict comes from CI — not the agent's word.

Status — honest

Proven where it counts; clear about the rest.

26/26

objective gates passed · 3 repeats

50–88%

fewer tokens on gated cases

4.94 / 4.69

blind quality — SCROLL vs baseline

TOKEN REDUCTION vs unstructured baseline — cases where SCROLL's mechanisms apply

Per-case gated A/B eval · Codex CLI · 3 repeats. Trivial single-call tasks are near-parity by design — SCROLL adds no ceremony where it can't help.

A full case-gated live A/B eval (3 repeats) passes 26/26 gates: token, speed, universal compliance, quality non-regression, checkpoint-resume, HITL approval, drift-free portability, and the cost gate. Reliability is engineered — caps, checkpoints, verify-before-done, eval by end-state — not hoped for.

What runs today

newcheckaudit buildrunregistrycost resumeHITLcost-gatecaching

Roadmap

eval gradermcp / skill / plugin crash-resume hardeningAG-UI / OTelstorage adapters

What the field is finding

Not a hunch — where the field is already heading.

SCROLL's bets line up with what credible voices and adoption data already show. These are findings about the approach, cited and linkable — not endorsements of SCROLL.

“The most successful implementations weren't using complex frameworks… they were building with simple, composable patterns.”

Anthropic — Building Effective Agents ↗

Multi-agent swarms are fragile from poor context sharing; the pattern that works = many agents contribute, writes stay single-threaded.

Cognition (Walden Yan) — on multi-agents ↗

60,000+ projects use AGENTS.md — agents-as-markdown is now an open standard under the Linux Foundation's Agentic AI Foundation.

Linux Foundation — Agentic AI Foundation ↗

SCROLL turns these findings into a tool: simple over framework (a folder + ~400-LOC CLI), single controller, writes single-threaded (one owner of WORK.md), and agents-as-markdown (built on the AGENTS.md convention it ships). Its own efficiency claims are then measured by a per-case A/B eval — not asserted.

Turn the AI subscriptions you already pay for into portable, governed agents.

$npm i -g github:LeHungViet/scroll

Star on GitHub ↗