Advisors

A panel of role-specific skills that argue from focused perspectives and disagree on purpose.

Problem

Assistants default to agreement. Most strategic decisions do not need another voice confirming the user's instinct, they need a panel that will push back. Advisors is a set of five Claude skills modeled on Lincoln's cabinet pattern: appoint your sharpest opponents, not loyalists, and let them disagree in the room.

Four role-specific advisors (Visionary, Skeptic, Operator, Ethicist) each analyze the same decision through a fixed lens. An orchestrator runs them in parallel and compresses their outputs into a single recommendation with explicit areas of agreement and disagreement.

Architecture

Five skills conforming to the agentskills.io specification: four individual advisors plus an orchestrator (advisory-panel). Each advisor is a self-contained skill with its own system prompt, evaluation rubric, and example set. They are portable across any harness that implements the spec.

                  [user prompt]
                        │
                        ▼
              ┌──advisory-panel──┐
              │   orchestrator   │
              └────────┬─────────┘
                       │
        ┌──────┬───────┼───────┬──────┐
        ▼      ▼       ▼       ▼
   visionary skeptic operator ethicist
        │      │       │       │
        └──────┴───┬───┴───────┘
                  ▼
           synthesis pass
                  │
                  ▼
     [agreement]  [disagreement]
     [risk-adjusted recommendation]

The panel launches the four advisors as independent subagents, waits for all of them to finish, then runs a synthesis pass over the collected outputs. The synthesis pass is where the hard work happens: compressing four long analyses into the three output sections without flattening the disagreements that make the panel useful.

Each advisor can also be invoked on its own (/advisor-skeptic, /advisor-operator, and so on) for focused analysis from a single perspective.

Evaluation

Every skill ships with an evals.json defining per-assertion pass/fail criteria against three shared scenarios (personal career decision, small-business technology adoption, B2B SaaS strategic pivot). An LLM-as-judge grades each output with structured JSON output.

Every trial runs twice per scenario: once with the skill injected as the system prompt, once against a baseline prompt. Both run through claude --bare so hooks, plugins, and CLAUDE.md auto-discovery do not leak into the measurement. Iterations are keyed by git shorthash, with a -dirty suffix when skill files have uncommitted changes.

Published result

The advisory panel scored 96% against the rubric. The bare-prompt baseline scored 45%. 60 graded trials per release, zero parse failures. The runner, rubric, and per-assertion evidence are in the public repo.

Evaluation framework and method (README)

Three learnings