Brad Stevens reference guide Sterling / Hermes / GPT-5.5 Cyndra / Anthropic comparison

Division of labor for agent execution, strategy, and evaluation.

This page defines where Sterling on Hermes should be used, where Cyndra on Anthropic is likely the stronger lane, and how Brad should route work by urgency, complexity, risk, and proof requirements.

Sterling / Hermes best lane

Execution infrastructure. Use Sterling when the work needs tools, artifacts, accounts, files, pages, calendars, Gmail, Drive, proof, tables, or repeatable workflows.

Strong: execution Strong: GWS Strong: Cloudflare Best with bounded scope

Execution fit

Proof discipline

Research fit

Open-ended judgment

Cyndra / Anthropic likely best lane

Strategic judgment and long-form synthesis. Use Cyndra when the work depends on nuance, relationship sensitivity, broad executive intuition, or strategic narrative more than direct tool execution.

Strong: judgment Strong: synthesis Best for final tone Useful as reviewer

Strategy fit

Narrative fit

Relationship nuance

Tool execution

Core scorecard

Practical routing by work type. Sterling is the operating system for completion and proof. Cyndra is the executive reasoning layer when ambiguity, strategy, and tone matter most.

Work category	Sterling / Hermes fit	Cyndra / Anthropic fit	Recommended owner	Evaluation standard
Gmail open-loop audits	High. Search, batch, classify, table output.	Medium. Useful for judgment review.	Sterling primary, Cyndra optional reviewer.	Thread evidence, confidence level, owner, next action.
Calendar, reminders, invites	High. Direct execution and proof.	Low. Not needed unless wording matters.	Sterling.	Event exists on correct calendar with correct time, invitees, reminders.
Cloudflare Pages and microsites	High. Build, deploy, verify.	Medium. Design/copy critique.	Sterling primary.	Live URL, production verification, no unintended site changes.
Vendor/product research	High. Structured options and links.	Medium. Strategic recommendation.	Sterling for research, Cyndra for final executive read.	Sources, pricing/constraints, recommendation, uncertainty flagged.
Operational dashboards and KPI packets	High if data access is available.	Medium for interpretation.	Sterling data pull, Cyndra narrative if needed.	Accurate data, clear exceptions, action list.
Recurring SOP functions	High. Repeatable workflows improve over time.	Medium. Can help design the SOP.	Sterling runs, Cyndra helps design.	Documented process, repeatability, proof artifacts.
Sensitive client communication	Medium. Good for draft structure.	High. Better for nuance and final tone.	Cyndra final, Sterling can draft options.	Relationship-safe, clear, aligned with Brad voice.
Strategic planning	Medium. Good as research support.	High. Better primary strategist.	Cyndra primary, Sterling support.	Insight quality, tradeoff clarity, decision usefulness.
Social/media approval workflows	High for intake, packets, queue pages.	High for voice and creative judgment.	Shared. Sterling builds system, Cyndra polishes voice.	Approval-gated, organized, platform-specific, no unauthorized posting.
Technical troubleshooting	High when subsystem is known.	Medium for second opinion.	Sterling primary.	Root cause, exact fix, proof of recovery.
Unbounded deep dives	Medium unless converted to batches/workers.	High for broad synthesis.	Split: Cyndra frames, Sterling executes batches.	Clear scope, batch proof, no vague “still looking.”

Routing rules

Use these rules to decide where to send work first, and when to use one agent as a reviewer for the other.

Route to Sterling first when...

The work needs a real action, file, deploy, event, doc, sheet, or proof.
The output should be a table, checklist, dashboard, page, packet, or audit.
The task can be defined by account, date range, source, and deliverable.
Google Workspace, Cloudflare, browser automation, or local files are involved.
Brad wants execution rather than strategy discussion.

Route to Cyndra first when...

The work is high-context, strategic, and not clearly bounded.
The answer depends on executive judgment or relationship sensitivity.
The final voice, persuasion, diplomacy, or tone matters more than artifact production.
Brad wants a thought partner before deciding what to execute.
The question is “what should we do?” more than “go do this.”

Use both when...

There is strategy plus execution.
There is a sensitive message that also requires Gmail/Docs action.
There is a client-facing deliverable where structure and tone both matter.
There is a large audit where Cyndra should frame criteria and Sterling should run the searches.
The stakes are high enough that a second-agent review is worth it.

Sterling operating strengths in detail

What Sterling should be evaluated on, based on current history and tool access.

1. Tool-backed execution

Sterling is best when the work has an external system to inspect or change. Examples include Gmail, Google Calendar, Drive, local files, Cloudflare Pages, browser pages, and operational dashboards.

Create the artifact.
Verify it exists.
Report proof concisely.
Flag blockers honestly.

2. Structured audits

Sterling is strong at turning a messy search space into a classification table. The key is to define the scope and allow uncertain findings to be included rather than over-filtered.

Gmail open loops.
Unpaid bills and unresolved vendor threads.
Client issues waiting on reply.
Drive/document inventory.

3. Preview-first web work

Sterling can build separate Cloudflare preview pages so Brad can review a real surface without touching a production site. This is a strong use case because success is visible and testable.

Microsites.
Review pages.
Dashboards.
Internal operating references like this one.

4. Repeatable operations

Sterling improves when work becomes a recurring operating rhythm. Recurring scans, scorecards, media review packets, and weekly operational summaries are better fits than one-off vague prompts.

Define cadence.
Define source systems.
Define output table.
Define approval gates.

Known Sterling weaknesses

These are the places where Brad should either constrain the task tightly or use Cyndra as the primary agent.

Open-ended depth

Large, undefined searches can hit tool-call or context limits unless converted into batches, background jobs, or proof files.

Mitigation: batch scope

Relationship nuance

For delicate client, partner, or internal messaging, Sterling can structure the draft, but Cyndra may be better for final tone.

Mitigation: Cyndra final pass

Autonomous continuity

Sterling should not imply overnight work unless a real cron, worker, or background process exists and is verified.

Mitigation: durable worker

Web access friction

Some vendor/search sites block automated access. Sterling should pivot to direct sources, browser paths, or ask for manual verification only when necessary.

Mitigation: fallback sources

Evaluation rubric

How Brad should judge whether Sterling did the job well.

Evaluate Sterling on execution

Real output

Did Sterling create, change, search, deploy, schedule, or verify something real?

Proof

Did Sterling provide a URL, event detail, file path, message ID, table, or verification statement?

Scope control

Did Sterling stay inside the approved scope and respect public/email/deploy/finance gates?

Business usefulness

Did the output make Brad’s next decision easier?

Evaluate Cyndra on judgment

Strategic clarity

Did Cyndra identify the actual business issue behind the request?

Nuance

Did Cyndra handle people, tone, politics, or timing better than a purely operational answer?

Prioritization

Did Cyndra tell Brad what matters most and what to ignore?

Final voice

Did the final communication sound like the right executive-level message?

Sample task routing

Concrete examples of what Brad should hand to each agent.

Brad request	Best routing	Why	Ideal prompt shape
“Find every unresolved business email from the last 60 days.”	Sterling primary.	Search, classification, evidence, table.	“Search business Gmail 60 days, include uncertain items, return owner/status/next action.”
“What should I say to this upset client?”	Cyndra primary, Sterling optional draft table.	Relationship nuance matters.	“Give me 3 response options: warm, firm, and executive concise.”
“Build a review page for these social posts.”	Sterling primary, Cyndra voice review.	Sterling builds system, Cyndra improves copy.	“Create Cloudflare review page, no posting, include approval controls.”
“Compare tools and tell me which to buy.”	Sterling research, Cyndra decision review.	Needs facts plus judgment.	“Make a vendor matrix with price, constraints, recommendation, and confidence.”
“Create reminders and invite Rob.”	Sterling.	Direct calendar action.	“Put this on business calendar, invite X, add email reminder.”
“Think through our AI operations model.”	Cyndra first, Sterling converts to SOP.	Strategy first, execution second.	“Cyndra: design model. Sterling: turn approved model into dashboard/SOP.”

Recommended handoff model

The best operating model is not either/or. It is strategist plus executor, with explicit proof checkpoints.

1. Cyndra frames

Use Cyndra to clarify the real problem, business stakes, sensitive context, and strategic options when the work is ambiguous.

2. Sterling executes

Use Sterling to run the bounded work: search, build, deploy, schedule, organize, document, verify, and create reference artifacts.

3. Cyndra reviews

For high-stakes messaging or strategy, send Sterling’s output back to Cyndra for final executive tone and judgment.

Recommended default split

Sterling is the execution and operating infrastructure agent. Cyndra is the strategic judgment and narrative agent. For the highest-leverage workflow, let Cyndra decide what should happen, then let Sterling build the page, search the mailbox, create the calendar item, deploy the preview, or produce the proof-backed operating artifact.

Guardrails

Public, financial, credential, and live-customer actions remain approval-gated regardless of which agent is used.

Actions Sterling should not take without explicit approval

Sending email or Google Chat messages.
Posting, commenting, liking, or publishing publicly.
Deploying to a production site unless Brad approved that deployment scope.
Changing credentials, billing, accounting, payments, or finance records.
Changing live customer data.

Best proof format from Sterling

Target account or project.
Action taken.
Verification result.
URL, file path, event details, message ID, or table.
Clear statement of any blocker or uncertainty.