`design-council`: 11 agents debate your next architecture call

`design-council`: 11 agents debate your next architecture call

`design-council` by Steven Syrek (sjsyrek) is a Claude Code-only skill that spawns 11 independent specialist agents — each with its own context — to debate a cross-cutting architecture or design decision in parallel. The article covers the 6-phase debate protocol, the full seat roster with dynamic sizing rules, exact installation commands, a concrete opening-prompt example, and five documented limitations including the 10–20× token cost and complete absence of community validation as of today. A comparison table against `hallmark` and `frontend-design` clarifies that all three can coexist in a workflow.

Today's Trending Agent Skills
2026/5/26 · 2:21
購読 5 件 · コンテンツ 9 件

リサーチノート

Most design review tools give you one Claude with a checklist. design-council gives you 11 independent Claudes who argue with each other — and leave a paper trail.
The skill, authored by Steven Syrek (Senior Engineering Manager at DeepL), landed on GitHub in late April 2026 1 and has ~150 stars as of this writing. It's very new, has zero community tutorials, and the README explicitly warns you it costs 10–20× more tokens than a single-context review. That context matters before you install it.
リンクプレビューを読み込んでいます…

What problem it solves

Syrek's core argument is architectural, not stylistic: 1
"A single context, no matter how capable, evaluates a cross-cutting design decision from one vantage point."
When a decision touches authentication, API contracts, performance budgets, accessibility, and documentation simultaneously, a single Claude will evaluate all those dimensions in sequence — from one mental model. It may not produce a security veto when the platform engineer's instinct would have caught a problem, because there's no platform engineer. There's just Claude wearing all the hats.
design-council fixes this by spawning each seat as a truly independent Claude agent with its own context. As SKILL.md puts it: 2
"Every teammate has its own context — not a subagent inheriting yours — so disagreement is structural, not simulated."
Peer DMs go seat-to-seat. The CEO (your orchestrating Claude) routes tiebreakers and arbitrates unresolved disputes, but doesn't inject its own opinion into the debate itself.
Independent agent contexts connected as a peer network, each node unaware of other nodes' internal state
Independent parallel contexts — the structural guarantee behind design-council's disagreement model 1

The 11 default seats

The full roster with ownership domains: 2
SeatOwns
principal-engineerArchitecture, module boundaries, simplicity — opens the debate with a ≤300-word position paper
platform-engineerSystems, infra, data shape, operational cost, observability
integration-engineerDownstream consumers, third-party developers, backwards compatibility
test-engineerTDD, mutation ritual, coverage, assertion hygiene
qa-engineerUser flows vs. spec alignment, regression surface, manual test plan
security-engineerInput validation, secrets, path safety, error sanitization
performance-engineerBatching, memory, concurrency, measurement before optimization
product-managerUX alignment, product coherence, best-practice conformance
ui-ux-designerErgonomics, visual consistency, interaction design
accessibility-specialista11y, keyboard navigation, screen reader, contrast
technical-writerDocs, in-app help, CHANGELOG, API reference
Dynamic sizing is the default, not a premium mode. No runtime UI in your project? Drop ui-ux-designer and accessibility-specialist. No user-facing input or infrastructure? Also drop security-engineer and platform-engineer. Internal tooling decisions often land on 4–6 seats. The plan card (Phase 0) shows you the proposed roster before anything spawns. 1
Five opt-in seats are available for specific situations: devops-engineer (deploy risk, CI/CD, rollback), finops-engineer (cloud/API cost), legal-compliance (privacy, licensing), domain-expert (narrow subject-matter SME), and historian (codebase precedent for mature repos).

Installation: Claude Code only

This skill uses TeamCreate, TeamDelete, Agent (with run_in_background and team_name), SendMessage, and TaskCreate — all Claude Code-specific primitives. There is no Cursor, Cline, Copilot, or Gemini CLI support, and none is mentioned as planned. 1
Install via the plugin marketplace:
/plugin marketplace add sjsyrek/claude-plugins
/plugin install design-council@sjsyrek
To pin a specific version, clone the repo at a tag and load it locally: 1
git clone --branch v0.2.1 https://github.com/sjsyrek/design-council.git
/plugin marketplace add ./design-council
Optional: split-pane observability. Running inside tmux or iTerm2 and setting "teammateMode": "tmux" in Claude Code settings.json renders each seat in its own pane, so you can watch agents debate in real time. Without it, seats share the main pane and you cycle with Shift+Down.

How a council session runs

Invoke with any of these phrases: "convene the council", "design debate", "council review", "run a design review", or "debate this design." 2
The protocol has six phases: 3
Six-phase protocol flow from Plan Card through Log and teardown
Protocol phases from the design-council documentation 3
Phase 0 — Plan card. The CEO drafts the proposed roster, per-seat model, budget estimate, and opening question. You reply go to proceed, or adjust with swap X for Y, drop X, add X, or abort. This step is not skippable unless you've granted explicit "auto-mode" in your CLAUDE.md.
Phase 1 — Brief. The CEO gathers all binding constraints verbatim from your CLAUDE.md, specs, memory, and task tracker — then writes a shared ~/.claude/councils/<slug>/brief.md. All seats point at this one file, which enables prompt-cache hits across parallel spawns (roughly 7–12k tokens saved per 8-seat council). 4
Phase 2 — Spawn. All seats launch in a single multi-tool-call message — sequential spawning violates the parallel-first principle. 3 Opus handles synthesis-heavy seats (principal-engineer, product-manager, technical-writer, historian); Sonnet handles analytical seats (test-engineer, performance-engineer, platform-engineer, qa-engineer). All-Opus is an option for high-quality-bar calls — at significantly higher cost.
Phase 2.5 — Handshake verify. The CEO counts incoming handshake DMs and inspects for empty tmuxPaneId entries, which indicate silent spawn failures. This step exists because Agent can return [Tool result missing due to internal error] while still registering a teammate — Syrek observed a 3/13 silent-failure rate in one session. 3
Phase 3 — Cross-talk. Seats post opening verdicts tagged APPROVE, CONCERNS, or BLOCK, then DM each other directly to debate. The CEO routes: pairing disagreers, inviting tiebreakers, asking narrowing questions. Hard cap of 3 rounds.
Phase 4 — Arbitrate. The CEO writes a 3–5 sentence decision for every unresolved disagreement, explicitly engaging the losing argument. Strategic, budget, or legal calls escalate to you.
Phase 5 — Log + teardown. The CEO posts a draft decision log. You reply save, amend, or discard. On save, the log persists to ~/.claude/councils/<yyyy-mm-dd>-<slug>/log.md — outside any repo, intentionally durable.

A concrete opening prompt

The opening prompt template (for Phase 1) has six required fields: 5
Decision Question: Should we move our auth tokens from localStorage
  to httpOnly cookies across all three clients?

Binding Constraints: [pointer to brief.md]

Non-Goals: Do not revisit the session storage migration we completed
  in Q1. Do not reopen the JWT vs. opaque token debate.

Prior-Council Context: council-2026-04-22-session-refresh found that
  Safari's ITP behavior requires an explicit cookie prefix strategy.

Success Criterion: A decision with enough implementation specificity
  that the integration-engineer can write the API contract change.

Known Deadlines/Budget: Ship by 2026-06-01. Full 11-seat council is
  within budget.
As the template notes: "Write as if the user were going to read it. Clarity here cascades into the whole debate." 5
There's also a Review mode variant — for codebase audits rather than a single decision. In Review mode, Phase 3 cross-talk is skipped; each seat independently produces FINDING N blocks with a P0/P1 priority tag. The CEO deduplicates and files tracker items.

Known limitations and honest caveats

Syrek documents these in the README's Tradeoffs section: 1
Token cost. 8+ parallel contexts, each reading role briefs and the shared brief, plus 1–3 rounds of cross-talk — budget roughly 10–20× a single-context review. Wall-clock time ≈ the slowest seat (they run in parallel), not the sum, but the cost is real.
Orchestration latency. Every cross-talk round waits for the last seat to respond. If a synthesis-heavy seat goes deep on a sub-problem, it gates the whole round.
No community validation. As of today, there are zero third-party tutorials, no before/after comparisons, no Reddit threads, and no X posts about this skill. The only evidence of it working is the author's own dogfood sessions — including a real data-loss P0 incident he used it to investigate. 4 That's meaningful, but it's not independent confirmation.
Seats are strong generalists, not domain experts. The domain-expert opt-in exists precisely because the standard seats can't replace a real SME on narrow technical calls. 1
Silent-promise risk on DEFER. If the council defers a decision and no tracker item gets created, that decision disappears entirely. Phase 5's guard only catches deferrals when a tracker is configured.
v0.2.1 is very recent. The changelog documents bugs caught in real sessions: a seat that prose-acknowledged a shutdown request and blocked TeamDelete, cross-talk closure racing against in-flight DMs, and missing prior-council context causing a council to re-derive conclusions it had already reached. 4 These were fixed in v0.2.1, but the version is barely a month old.

How it fits alongside hallmark and frontend-design

These three skills occupy different parts of the design workflow and can coexist: 2
SkillModeWhen
frontend-design (Anthropic)Single-agent creative directionBefore or during UI generation — answers "what should this look and feel like?"
hallmark (nutlope/Together AI)Single-agent quality gateAfter generation — runs 69 slop-test checks against the output
design-council (sjsyrek)Multi-agent architectural debateBefore implementation — produces a decision log for cross-cutting calls
The architectural difference is real: hallmark and frontend-design both run one context evaluating against a rubric. design-council is the only one where disagreement between viewpoints is structural — a security engineer that wasn't told what the platform engineer said, reaching different conclusions. Whether that structural independence produces better decisions than a well-prompted single context is an open empirical question this skill can't yet answer from external data.

When NOT to use it

Syrek is explicit here, and it matters given the token cost: 1
"Do not invoke for simple bug fixes, single-specialist questions, library/tool picks, or pure exploration (→ Explore). The token cost isn't earned."
The intended use case is decisions that cross two or more specialist domains and have meaningful downstream consequences. Choosing a caching library is a single-specialist call — invoke design-council for that and you're paying 10–20× to get 10 agents to agree with the one who had the relevant opinion.
The sweet spot: an API redesign where backwards compatibility, security, performance, and documentation all conflict. Or a data architecture decision where platform costs, integration contracts, and test strategy pull in different directions. The decision log output at ~/.claude/councils/ is also only useful if you'll actually read it — if you're making a decision you'd never audit later, skip the council.

Cover image: AI-generated illustration

このコンテンツについて、さらに観点や背景を補足しましょう。

  • ログインするとコメントできます。