2026-06-10 · 9 pages + 69 GitHub docs · 84 agents · full corpus sweep

Claude Deep Research
Why GitHub Ops Failed as an Agent OS

Plain-English verdict: the architecture is right — GitHub as control plane, VPS as disposable runtime, Hermes as operator shell. But the execution loop (issue → branch → artifact → evidence → PR → merge → close) completed zero times ever. Sessions die cold — 488–522 of them in 30 days, each re-deriving state from scratch. The rules existed as documents agents were asked to read, never as gates that forced them to. Voluntary reading was measured at 0% compliance. This page maps what exists, what was intended, what broke, and what the enforced bootstrap can build on.

Architecture · right Execution loop · 0× ever Rules = documents, not gates Ground-truth for the bootstrap decision

Research scope

Date2026-06-10

Site pages read9 (viewport.llc/migration)

GitHub docs read69

Agents deployed84

Repos coveredviewport-ops · viewport-os · viewport-kb · fork-hermes-agent

PurposeGround-truth for the enforced-bootstrap decision (ACTIVE_TASK.json + lease + scope-fence)

Not in scopeDesigning the bootstrap itself

Read this first · no jargon

What this page is, in plain English

What was investigated: all 9 public pages at viewport.llc/migration, plus all 69 GitHub documents, issues, PRs and files they link to — read in full by 84 research agents on 2026-06-10.
The question: you built a system where GitHub was supposed to run your AI agents. It didn't work, and you said "I still don't know what you are talking about." This page explains exactly why, with every number sourced.
The single biggest finding: the rules for agents existed only as documents agents were asked to read. Nothing ever forced them to. Measured compliance with "please read this first" files: zero percent.
Result: of 139 planned tasks, exactly 1 was ever finished. The full work cycle (open task → do work → prove it → merge → close) completed zero times ever.
Why agents "forgot" everything: 488–522 separate sessions in 30 days, each starting with no memory. A hidden root cron job killed every agent every 6 hours, mid-task, with no handoff saved.
Why agents never saw the rules: the main rule file was 18,870 characters; the runtime cuts injected files at 12,000. The bottom of the rulebook was silently thrown away.
Why status reports lied: pages said "P0 complete" while the actual work sat on a local branch, 6 commits never pushed. The push token was broken (403) for weeks and nobody noticed.
The good news: your architecture is right, and one week (issue #196, PRs #197–#205) proved the loop CAN close — when a CI validator failed loudly instead of relying on agents' goodwill.
What you have to decide: 16 specific questions (bottom of this page) before building the enforced bootstrap — mainly: which file is THE single source of the active task, which branch is canonical, and what to do about the leaked admin token and the kill-cron.
What this page is not: it does not design the new bootstrap. It is the ground-truth map you asked for, so the design is built on facts, not memory.

Headline numbers

The failure, in eight numbers

Every figure below comes from the system's own committed surfaces — task board, audit, forensics, session database.

139 / 1tasks total vs tasks DONE — throughput ≈ 0

0×times the closure loop (issue→branch→artifact→evidence→PR→merge→close) ever completed

10 / 13audit sections FAIL (2 PASS, 1 UNKNOWN)

488–522cold-start sessions in 30 days, 29,009–34,330 messages

18,870chars in AGENTS.md vs 12,000-char injection limit — silently truncated

every 6hroot kill-cron pkills all agents at 00/06/12/18 UTC

6commits unpushed while pages claimed “P0 complete”

0secret rotations done; admin:enterprise PAT unrevoked since 2026-05-10

Audit 2026-06-05 · 13 sections

FAIL10

PASS2

UNKNOWN1

Only Cloudflare pages (S0) and S12 recommended-architecture pass. Detail in §1.6.

Task board · 139 tasks

NOW58

NEXT73

BLOCKED4

DONE1

WATCH3

DONE = 1 of 139 — throughput ≈ 0. The closure loop never completed outside the #196 stretch.

AGENTS.md vs injection limit

12,000 limit

018,870 chars

Everything right of the white line was silently truncated — agents never saw the bottom of the rulebook. Fresh rebuild later cut it to 7,759, then 3,525 chars.

Session volume · 30 days

488–522

cold-start sessions · 29,009–34,330 messages

One cell per day. Every session re-derives state from scratch; a root kill-cron pkills all agents at 00/06/12/18 UTC.

Root causes

5 smoking guns

The five mechanical reasons the system could never have worked — independent of how good the plans and prompts were.

A root cron kills every agent every 6 hours

/etc/cron.d/claude-cleanup runs as root: pkill -u openclaw claude + pkill -f "claude --dangerously" at 00/06/12/18 UTC. Every agent hard-killed mid-task. Handoffs were only written at completion — so a killed agent left nothing. This is forensics root cause #1 and explains "agents keep dying".

The rulebook was silently cut off

The old OpenClaw workspace AGENTS.md was 18,870 chars against a 12,000-char injection limit. Runtime log: "truncating in injected context". Agents literally never saw the bottom of the rule file. Truncated silently instead of failing loudly. (Fresh rebuild later cut it to 7,759, then 3,525 chars.)

Voluntary reads = measured 0% compliance

Every bootstrap contract was a "please read this first" convention with no gate. Measured: council STATE.md frozen at round 000 for 29 days on one unread flag; tracker.json holds exactly 1 event; LIVE_HANDOFF.md went stale within ~24h. Forensics verbatim: "state files no one is FORCED to read are dead state."

Broken plumbing under the contract layer

GitHub push 403 (commit fbd75bc stranded); the PAT was never wired into the agents' exec environment; gh CLI missing on the box; and the repo has no main branch (default is council/bootstrap-20260510) — so every /blob/main/ "canonical" URL 404s and sessions improvise.

No lease + five competing “what do I do” surfaces

Task state lived in a 139-row board, 4 flat HANDOFF bullets, 3 status.json active_tasks, 24 open issues, and a "today_focus" list — none authoritative, none with an owner or lease. Result: duplicate issues (#193/#194, #179–#181), parallel tracks (PR #83 vs #56), and wrong-task pickup.

Section 1

What the migration system actually is today

The published surface, the repos, the machine contract, the runtime, the task board, the audit, the secrets — as found on 2026-06-10.

1.1 The published surface (viewport.llc/migration)

Nine public pages, all generated from the private repo viewport-corp/viewport-ops (branch ops/openclaw-github-flow-44) and served by a lightweight Cloudflare Worker proxying committed GitHub files (the prior embedded Worker exceeded the 3 MiB limit — recorded as a resolved failure in status.json):

Page	Role	Freshness at fetch (2026-06-10)
`/migration/`	Command Center hub; "GitHub is source of truth. VPS is disposable runtime. Hermes is the operator shell. Everything is evidence-backed or it does not count."	status.json generated 2026-06-08
`/migration/restart` (+ /forensics, /transcript, /brain, /ideas, /plan)	Featured 5-section forensic rebuild suite, built 2026-06-09 by 5 read-only agents	2026-06-09
`/migration/plan`	"Master Operating Plan V2 / Reality-audited V3", 37 phases, 17 departments	body 2026-06-05, live bar 2026-06-09
`/migration/task`	139-task execution board, declared "the active handoff board" (T-080, Sam's explicit request)	generated 2026-06-04
`/migration/audit`	Full system audit, 13 sections, ~3.5 MB incl. inline evidence	audit run 2026-06-05
`/migration/status` + `/migration/status.json`	Human + machine status surfaces; ui_contract: "React UI (Sam builds) fetches status.json; Hermes updates JSON only"	status.json generated 2026-06-05, rendered 2026-06-09 — 4-day stale at render
`/migration/public/slack`	Slack Command Room (top approval layer)	2026-06-05
`/migration/public/odoo`	Odoo Command Room (record-first rule)	2026-06-05

Canonical doctrine (status.json + restart/plan): Viewport Corporation = parent holding, Viewport OS = operating system, PlatformX = product layer, GitHub (viewport-corp) = control plane and brain, agents = workforce, VPS (vmi3130827 / 194.163.153.171) = disposable runtime rebuildable in <20 min. This matches Sam's target-architecture diagram exactly — the restart page's own verdict is "Your architecture is right… 100% of the failure is execution."

1.2 Repos (audit Section 1)

9 repos in viewport-corp: fork-hermes-agent, viewport-ops, fork-openclaw, fork-hermes-bccl, tenant-bccl-laowise-website, demo-repository, modern-lao-homes-client-portal, product-tradex, migration-dashboard-internal. Write access proven via throwaway test issue #177 (created+closed same second). Only 3 repos have runtime-contract files; 2 repos unmappable to tenant; the PAT has near-total admin scope (admin:enterprise, admin:org, delete_repo, repo, workflow…) with unknown expiry (S01-PAT UNKNOWN).

Critical structural fact (forensics page): repo name inversion — viewport-corp/viewport-os ("the obvious name") is an 8-file stub on main; viewport-corp/viewport-ops is the real 522-file control plane (later tree: 734 paths, 69 commits), and its default branch is council/bootstrap-20260510, not main — no main branch exists. Every /blob/main/ URL 404s. The repo contains three generations of agent OS side by side (Migration/council v3 harness; root AGENTS.md + agent-entry-protocol; companyos runtime layer) plus duplicate dirs company-os/ vs companyos/ and Migration/ vs migration/.

1.3 The status.json contract

Schema viewport-status-v1 (~514 lines). Contains: system_health (Hermes ONLINE v0.15.2, gateway restart still needed for live Telegram intake hook), task_board, today_focus (3), active_tasks (3: setup4, accept, status-react), blocked (1), completed_today (1), recent_failures (2), agent_handoff_pack ("New agent? Read this first": HANDOFF.md → /migration/audit → /migration/task → viewport-kb INDEX.md, plus do_not_touch list and next_priority), instruction_files registry (6 canonical docs), gsd_ralphloop block, runtime-contract policy, migration_execution ledger pointer, 3 structured blockers, 5 approval gates, secrets register summary, plain-English update format, and a not_done_claim ("Website/status reporting is not the migration").

Internal inconsistency inside the dashboard's own truth: system_health/vps_runtime_reconciliation say 66 containers / 66 running / 3 unhealthy / 56 no-repo; page copy + migration_execution blockers say 72 / 65 / 3; the priority-queue count is 72; all_container_names lists ~73 names; forensics (2026-06-09) counts 66. The canonical machine surface disagrees with itself.

1.4 Hermes and the runtime

Hermes v0.15.2 (2026.5.29.2), model gpt-5.5 via OpenAI Codex OAuth; 22 toolsets enabled / 4 disabled (noise: homeassistant, spotify, yuanbao, moa still on per S03); MCP: filesystem, memory, sequential_thinking, time, fetch, github, git_plugins, context7; Telegram only messaging platform (home chat 6596211381); 0 of 7 scheduled jobs active; session DB /opt/data/state.db with 488 sessions / 29,009 messages in 30 days (restart page counts 522 sessions / 34,330 messages) — each a cold start.
VPS: 72 containers (65 running), 3 unhealthy for 4–5 weeks (saathi-app-1, origin-backend, platformx-nextcloud); 56 without repo label; 50 without domain; 49 "ghost" containers with neither (S08 FAIL). Dokploy + Coolify + NPM/nginx + Traefik coexist (dual-orchestrator conflict; manual file /etc/dokploy/traefik/dynamic/modernlao-transition.yml owns key routes). Operational debris: 4 stale mlh-clients-portal clones, 3 frozen mlh-api-handler rollback copies, sandbox container openclaw-sbx-agent-bizdev-134566cd, 2 hash-named unidentifiable containers.
OpenClaw: old config 24 active seats / 25 workspace folders / 50 legacy crons (47 active in a second file; prior memory said 26/48 — never reconciled, S03-OLD-COUNTS FAIL). Fresh OpenClaw: 26 config agents / 25 folders / exactly 1 cron (disabled). Forensics: 26 agents in openclaw.json but 0 cron_jobs_attached per agent.
Mount blocker: /srv/viewport/migration exists on VPS; /opt/data/migration MISSING inside active Hermes; restart "can kill running agents" — approval packet drafted, no apply (P0-1, PR #202).

1.5 Task board and execution state

139 tasks total — NOW:58, NEXT:73, BLOCKED:4, DONE:1, WATCH:3. Throughput ≈ 0 (restart verdict #1). The full closure loop (issue → branch → artifact → evidence → PR → merge → close) has completed zero times ever (Surgical Origin Audit v2; forensics).
Ledger drift: live parse 139 tasks / 33 phases / 58 NOW vs Sam's remembered canonical 127 / 36 / 52 (S10-COUNTS FAIL); only ~70.7% of NOW tasks have evidence text, none runtime-proven (S10-REALITY FAIL).
4 BLOCKED: T-002 GitHub push 403 after commit fbd75bc; T-122 source-of-truth push repair (branch ahead 6 commits unpushed, gh CLI missing on box, GitHub MCP get_file_contents Not Found); T-018K old-Docker cleanup queue; T-024 Modern Manager live identity unverified.
The only DONE (T-054): Odoo/Slack validation freshness — and even that fixed a stale board claim (13/2 vs actual 15/0).

1.6 Audit state (2026-06-05)

13 sections: PASS: 2 FAIL: 10 UNKNOWN: 1. Only Cloudflare pages (S0 build + working routes) and S12 recommended-architecture pass.

FAILs: GitHub inventory, VPS runtime, agent fleet, KB/brain (no unified brain — Hermes/old OpenClaw/fresh OpenClaw use 3 different stores, S04), Domain+DNS (21/61 ghost zones, bccl.la UNKNOWN), observability (no unified dashboard), security (sk- ×970, ghp_ ×138, TELEGRAM_BOT_TOKEN ×276, CF_API_KEY ×61, AIza ×32, xoxb- ×14 in session DB; no secret manager), old Docker reference (49 ghosts), CompanyOS schema (10/10 files exist, zero runtime/CI enforcement, no authority gateway — S09), plan-vs-reality (ledger drift). S11 UNKNOWN (Telegram export access blocked).

Published as 36 redacted evidence files with 1,187 redactions (~886 in 5 weekly Hermes transcripts alone — live credentials routinely leaked into session logs). Per-section remediation issues #182–#191 filed — all flat, unowned, 0 comments, several with evidence pointers that 404 on the default branch (evidence never committed: #182, #183, #184, #186, #187).

1.7 Secrets and approval gates

Secrets exposure register (category-only, P0-2/PR #203): openai_sk 179, secret_value 99, password_value 29, google_key 13, telegram_env 6, cf_key 5, github_pat 2, ip_non_public 854. Zero rotations complete, no owners assigned; automation_gate: no expanded production autonomy until rotated/scoped. The leaked admin:enterprise PAT remains unrevoked since 2026-05-10 (pat_revoked:false in Migration/council/STATE.md) — this single flag froze the council at round 000 for 29 days. Five approval gates (Hermes/gateway/container restart; Docker mutation; DNS/billing/legal; production Odoo/Slack writes; service-breaking secret rotation) remain in force; 0 runtime mutations have ever been applied under the GSD loop.

1.8 What actually works

The Cloudflare-published pages (all routes 200).
The GSD/RalphLoop GitHubOps stretch, issue #196 → PRs #197–#205 (2026-06-05): operating contract + active queue + activation proof, validate_company_os.py hard-fail validator, migration execution ledger, tasks/current-active-task.yaml, secrets register, 72 RuntimeContracts, 48-entry agent registry, authority matrix, enforcement-gate spec, Slack/Odoo policy, 15-min script-only status cron (job 781cf3aa1cad). This is the only stretch where the loop demonstrably closed, and it did so precisely because a CI validator failed loudly when state artifacts were missing.
Telegram intake → Hermes (RECEIVE is "the only working joint" of the 6-joint pipeline, per forensics).
validate_odoo_slack_integration.py 15/15.

Section 2

The intended agent operating model

How GitHub Ops was designed to drive agents, by layer, with sources.

2.1 Doctrine and bootstrap-read contract

status.json agent_handoff_pack ("New agent? Read this first"): ordered reads — viewport-os/HANDOFF.md → /migration/audit → /migration/task → viewport-kb INDEX.md ("anti-amnesia knowledge base"); do_not_touch list (old Docker/OpenClaw, secrets/raw Telegram sessions, DNS/billing/legal/destructive, production Slack/Odoo writes); single next_priority field. Plus instruction_files registry of 6 canonical docs.
viewport-ops root AGENTS.md: "GitHub is the control plane. Do not do local-only work… issue → branch → commit → PR → verification → evidence." "Claude/Claude Code must not rely on chat memory alone."
docs/agent-entry-protocol.md (the closest thing to an enforced gate, on paper): 10-step preflight; hard rule — "If an agent cannot prove it read the current repo rules, current open issue, current open PR state, and current repo boundary, it must not write files." Agents work ONLY state:active issues; exactly one issue is state:active (#15, Odoo production install).
HANDOFF.md (viewport-os): ~40 lines — system state, tenants, 4 active tasks, last session summary, end-of-session rule "Update this HANDOFF.md at the end of every significant session." Write-side convention only.
Modern/CLAUDE.md (local precedent, found en route): mandatory "★ SESSION BOOTSTRAP — READ ON EVERY NEW SESSION" (SESSION_HANDOFF.md → MEMORY.md → MASTER_STATUS.md → DECISIONS.md → handoffs/) + mandatory end-of-session archival protocol, with the rationale documenting exactly Sam's failure ("sessions before 2026-05-03 left work undocumented… future sessions then operated on stale facts").

2.2 Issue/PR machinery

Issue label state machine (root AGENTS.md): state:active / blocked / protected / not-now / stale / superseded; only active/blocked/protected may stay open. Stale tasks get killed; resurrection requires a FRESH issue (#3 closure, #2 closure: "the old trail is no longer an active work selector").
PR contract: exactly one issue relationship (Closes / Supersedes / Part of); "No issue" forbidden except approved hotfix. Issue close requires what-changed, paths/PR/commit, test output, live URL checks, rollback path, remaining risk.
Issue templates: .github/ISSUE_TEMPLATE/01-office-intake, 02-task-packet, 03-incident, 04-runtime-change, 05-research-source-ingestion; label-driven intake pipeline (anti-amnesia / chat-capture / dept-* labels on #195).
[AUDIT-FIND] convention: stable check IDs, closed PASS/FAIL/UNKNOWN vocabulary, evidence pointers, explicit "Read-only audit" scope (#182–#191). [BLOCKER] convention: Blocker / Impact / Evidence / Required action with explicit unblock condition (viewport-os#2).

2.3 The execution loop: GSD + RalphLoop

8-step GSD loop (operating contract, PR #197; restart/ideas): Goal (issue/packet with exact outcome, owner, acceptance criteria, forbidden actions) → Setup (branch, artifact paths, validator, rollback boundary BEFORE touching runtime) → Do → Verify → Diagnose → Fix → Repeat → Evidence. failure_policy: max 3 fix attempts before mandatory architecture review; "failed verification becomes evidence; do not hide retries."
Ralph review loop (control-plane/workflows/gsd-ralph-proof-loop.md, issue #139): 4 markers before any promotion — review risk class, verify sources (GitHub / approved read-only runtime observations / committed artifacts ONLY), audit artifacts (no secrets / no scheduler binding / no customer sends / no prod mutation / no cross-tenant leakage), log next action. "A failed Ralph loop blocks promotion even when the GSD work artifact exists."
github_ops_truth (PR #197): durable_truth = issues, branches, commits, PRs, validators, evidence files, live status pages; not_truth = chat-only claims, uncommitted VPS edits, hidden cron/runtime state, old Docker labels without fresh verification. Source-of-truth ORDER (active-queue.yaml): 1) GitHub issue state, 2) git branch + committed artifacts, 3) validator output, 4) live pages, 5) read-only VPS probes.

2.4 Evidence rules

Plain-English update contract after every task: Phase / Task / Done / Proof / Blocker / Next / Live status URL (migration-execution-ledger.yaml; used verbatim in all 7 issue-#196 comments, each adding a "Safety boundary" negative list of mutations NOT performed).
Proof index machine-checkable (PR #201): each proof has kind (local_validator | live_http_json | live_http_html | verified_runtime_fact), exact command/URL, expected_markers.
9-field evidence object (gsd-ralph-proof-loop.md): github_issue_ref, github_pr_ref, risk_class, scope_boundary, gsd_proof, ralph_review, validation_commands, rollback_or_noop_path, protected_boundary_attestation.
Evidence-per-agent-run contract: evidence/agent-runs/<date>/<task>/{evidence.json, summary.md}, dozens of Phase 4a–4v records.
Handoff block per council round (Migration/council/AGENTS.md v3): to/from, phase, verdict (PASS|REVISE|BLOCK|QUESTION), summary, plain_english_block, open_questions, changed_files, evidence, risks, next_action, git{branch, commit_sha, pushed_to_remote}. "No handoff block = invalid round." Rounds append-only; errors fixed only by NEW correction rounds.

2.5 Governance: authority, risk, approvals

Risk/autonomy tiers (root AGENTS.md): Tier 0 observe → Tier 4 prohibited autonomous; human approval ALWAYS for DNS, prod runtime mutation, Docker stop/recreate, secrets, billing/legal, destructive ops. Tier-0 systems (OpenClaw, Hermes): "no fast path, no urgency override, no exceptions."
RuntimeContract policy: 8 fields required before ANY mutation (owner, tenant, repo/source, domain/routes, backup, healthcheck, rollback, approval class); every priority-queue item mutation_allowed:false; "No agent gets to say 'done' on a domain/app until the contract row exists, the live route is verified from public DNS, and the rollback file is named" (/plan Section 15).
Agent registry + readiness ladder (restart/ideas, P0-5): 48 seats, ALL seed_only_not_production; R0 draft → R5 proven, gated per level; forbidden status claims ("100% sure", "fully autonomous", "production-ready") until evidence.
Slack = TOP command layer (/slack, /odoo): record-first rule (Odoo record + evidence BEFORE Slack approval request), 5-step gated protocol, hard wait for explicit in-thread human approval; "No production action without Slack approval on record." Honestly badged PARTIAL FOUNDATION.
Sam = approval guardrail (/plan Section 3) for finance, legal, customer-facing, DNS, destructive runtime, billing, payroll, security.

2.6 Designed-but-unbuilt: the closed loop v0.1 and task/lease schemas

viewport-kb Closed Operating Loop v0.1 (2026-06-08): 9-stage loop (Capture→…→Learn); full task-packet schema with out_of_scope explicit NOT-do list, machine-checkable acceptance criteria, lease fields, append-only state_history; label state machine (exactly one label per group); lease protocol via single committed Migration/council/leases/active-leases.json with git-merge-conflict as race detection; mandatory session bootstrap in exact order BEFORE first tool call (STATE.md → the single state:active issue → tracker.json last 10 → weekly digest → policy.yaml; inaccessible ⇒ state:blocked + Telegram alert); handoff = issue comment with fixed format + label swap; authority-gateway policy.yaml with sam_only_actions and verifier ≠ executor; evidence bundle YAML schema; memory tiering; 2 watchers (morning brief, stale-loop detector); v0.1 smallest system = ONE state:active issue + Hermes with scoped PAT + one daily cron; loop invariant "a session that ends without closing the issue is a partial run that must be resumed, not restarted"; hard-stop list with scripted refusal.
companyos/runtime/task-ledger-and-fallback-policy.yaml (committed, default branch): required ledger fields incl. lease_owner, lease_expires_at, policy_version_git_sha, last_checkpoint_ref, resume_instruction; split-brain rules (one_lease_owner_per_task; check_lease_before_write; renew_lease_before_side_effect; stop_if_policy_git_sha_changed_without_reload; stop_if_conflicting_runtime_active); 8-step fallback takeover protocol.
Restart rebuild plan v1.0: 7-field mandatory handoff schema (Phase/Task/Done/Proof/Blocker/Next/Status-URL — missing field ⇒ handoff REJECTED); task-packet issues without DoD auto-labelled needs-dod and NOT picked up; truth-label state machine (truth:unverified → confirmed → superseded); dedup gate; "Agent restarts read the brain first — always"; AGENTS.md per repo = "the harness boundary"; daily heartbeat ("if the heartbeat goes silent, that IS the alert").
Task board encodes the same intent as future work: T-026 Role/Seat/Lease registry ("one executor per task"); T-029 lease schema exists but board does NOT enforce it; T-115 upgrade task-packet/task-lease/runtime-seats schemas (currently lack reviewer/verifier/tests/rollback/heartbeat/takeover/backup-seat); T-124 lease validator must require a VALID HANDOFF before a lease is granted; T-109 agents start from a defined bootstrap task packet; T-140 "do not claim autonomy until the loop is ENFORCED, not just described."

Section 3

Where it actually broke

Every failure below maps to one of Sam's three described failure modes — (A) context loss after /clear or exit/reopen, (B) wrong-task pickup, (C) harness drift (agents stop following the rules) — plus the cross-cutting (D) state went stale/false.

The closure loop as executed — completed 0× ever

issue

branch

push 403

artifactcommit fbd75bc stranded

evidence 404

evidence404 on default branch

never merged

merge

The three red joints are the breaks the research found: commits stranded behind the 403'd push token, evidence pointers that 404 on the default branch, and PRs that were never merged. Details below and in §3.4. The green counterexample — the #196 stretch — is drawn in “The one pattern that worked”.

A Context loss — sessions start cold, plans evaporate

488–522 sessions in 30 days, each starting cold, re-deriving state from scratch (Surgical Origin Audit v2; restart forensics: 522 sessions / 34,330 messages). Re-entry cost ~30 min, so reorientation was skipped (audit's "why Sam loses track" list).
AGENTS.md silently truncated: the old OpenClaw workspace bootstrap file was 18,870 chars against a 12,000-char injection limit — runtime log: "truncating in injected context." Agents literally never saw the bottom of the rule file (S11 smoking gun). Fresh rebuild cut it to 7,759 then 3,525 chars.
Memory never written back: mem0 up but no read/write hooks; 26 sessions.json files mostly empty; brain writeback depended on tasks completing — which they didn't (forensics REMEMBER joint). Three unbridged memory stores (Hermes sqlite / old OpenClaw markdown / fresh OpenClaw sqlite, S04 FAIL).
Stale pinned sessions: after the Codex OAuth repair, agent:main:main stayed pinned to the dead provider route — Telegram traffic routed through a dead session until explicit reset (transcript P3). Same class: intake_persistence plugin installed+enabled but the running gateway never loaded it (config ≠ runtime, viewport-os#2).
6-hour kill-cron: /etc/cron.d/claude-cleanup as root: pkill -u openclaw claude; pkill -f "claude --dangerously" every 6h — every agent hard-killed at 00/06/12/18 UTC, mid-task, with no handoff written (forensics root cause #1; "agents keep dying" explained). Handoffs were written at completion, so a killed agent left nothing.
The "fresh rebuild" (May 11) wiped the operational config: openclaw.json → near-default stub, 47 per-agent crons → 0, 26 agent wirings → empty. 50→1 crons.
Sam's own observed symptom, captured in the corpus: viewport-kb reference note 2026-06-05 — Sam asking the system "what are we working on"; the pipeline produced a contentless capture stub (url empty, tenant unknown).

B Wrong-task pickup — nothing binds a session to one task

Duplicate task creation: the same Neo4j company-brain task opened TWICE in one week (viewport-ops #193 and #194) "not because Sam forgot, but because there was no canonical open issue to find" (viewport-kb weekly digest). Audit section-2 fired 3× creating duplicates #179–#181 (session disruption mid-audit). Chat-to-task automation double-fired.
Parallel duplicate work tracks: PR #83 (Dokploy admin, issue #73) closed as "duplicate/superseded by #56 protected Dokploy route track" — two tracks, same goal, no single owner/lease.
Stale issues as wrong work selectors: issue #2 (Fresh OpenClaw Activation) drifted 3 weeks with two owners, one silent, force-closed with the explicit doctrine "the old trail is no longer an active work selector… create a fresh issue with current evidence." Issue #3 sat untouched 2026-05-10→06-01, bulk-closed stale.
HANDOFF.md lists 4 concurrent unowned tasks — flat bullets, no owner/lease/priority/scope/next-step — exactly the ambiguity that lets a session pick the wrong one.
Routing joint broken: all tasks landed on the main agent; 26 specialists got nothing (forensics ROUTE). Dispatcher-level single-task pinning (the upstream Hermes Kanban pattern in fork-hermes-agent) was never adopted on the Viewport side.
Phantom canonical paths: linked files tasks/current-active-task.yaml, tasks/gsd-ralphloop-active-queue.yaml (at the cited path), evidence/current-proof-index.yaml (at the cited path), runtime/p0-3-runtimecontracts-first-pass.yaml, plans/migration-phases.yaml (at the cited path) do not exist on any default-branch path — they live only on feature branches or nowhere. A session told "read the active task file" 404s and improvises. viewport-os#194 is a hallucinated/stale reference (repo max issue = 2).

C Harness drift — rules existed as documents, not gates

S09-ENFORCEMENT FAIL (issue #189): all 10/10 CompanyOS schema files exist but are "reference/partial" — no runtime/CI gates, no API-level authority gateway; nothing mediates what agents may do. The audit itself records the harness as "DESIGN EXISTS; NOT ENFORCED."
Council harness: 0% voluntary compliance. Migration/council STATE.md frozen since 2026-05-10 at current_phase: bootstrap / pat_revoked: false / next_agent: claude-opus-4.7 (a retired model) / active_round: 000; tracker.json holds exactly ONE event. The v3 protocol (527-line AGENTS.md, handoff blocks, turn-taking next_agent) never ran a single round. Council frozen 29 days on one unread flag.
PR #87 (LIVE_HANDOFF.md + MASTER_WORKFLOW.md, "every future session starts by reading this file first") went stale within ~24h and was closed unmerged — the prose-handoff-with-please-read-me pattern died in production twice (PR #1's 17-file knowledgebase was killed by Sam in 1.5h).
Checkbox-vs-state divergence (issue #195): all 7 checklist items checked [x], yet the issue is still open/needs-triage 5+ days later, no per-item evidence, no owner — checkboxes proven unenforceable as status signal.
Optimistic status drift requiring forced corrections: issue #88/PR #89 — agent reported runtime "updated"; Sam challenged; Correction comment revealed Hermes was only pulled, not applied (gateway in an invisible s6 container). Sam rejected Hermes's first migration report for summarizing instead of reading sources ("This is stupidity and not acceptable", transcript P7).
Status pages reported "P0 complete" from artifacts on an unpushed local branch (6 commits ahead, never pushed) — Surgical Origin Audit v2's headline finding; "reports look like completions" is one of its 4 recurring failure loops.
Incident with a price tag (issue #133): May-31 OTP rollout put auth in front of public MLH routes; runtime nginx config was not Git-tracked, so "GitHub could not fully answer operator/time/history"; emergency direct-VPS hotfix outside the harness.
Tool/provider drift: tools allowlist contained entries (apply_patch) unavailable in the runtime; Codex OAuth invalidation killed old OpenClaw; gpt-5 unsupported on ChatGPT accounts; 259 configured commands vs Telegram's 100-command cap.

D State stale, false, or unreachable

The evidence/closure joint never fired: GitHub push 403 (T-002, commit fbd75bc stranded); PAT never wired into agent exec env; gh CLI missing on the box; GitHub MCP get_file_contents Not Found for the live branch — live Cloudflare pages not traceable to a committed artifact ("GitHub is truth" claim itself FAILed in the audit).
Stale surfaces: status.json 4 days stale at render; HANDOFF.md 5 days stale; /odoo and /slack pages 5 days stale; weekly digest cadence vs daily failure cadence.
Count drift everywhere: 66 vs 72 vs 73 containers; 24 vs 25 vs 26 seats; 50 vs 47 vs 1 crons; 139/33/58 vs 127/36/52 ledger; 93 vs 92 vs 45 hard-coded tasks in the page generator (T-103); registry claimed 26 OpenClaw seats while live runtime had 1 agent (runtime-live-truth-audit.md).
Dangling evidence pointers: AUDIT-FIND issues #182/#183/#184/#186/#187 cite evidence/full-system-audit/sections/*.json paths that 404 on the default branch — evidence lives only in the VPS workspace.
Unmerged-branch black holes: PR #75's 9 evidence increments + RuntimeContract live only in PR comments on a closed, unmerged branch; sessions bootstrapping from the default branch never see them. The validator itself exists in two different versions on two feature branches with no main.
Sandboxed subagents can't write canonical state: 3 of 5 audit runs on 2026-06-08 produced empty/1-byte files ("Operation not permitted" in Docker sandboxes); only host-process Hermes completed. (Mirrors the local MEMORY.md finding that Codex image-gen fails in subagents.)
Specific stalled tasks: setup4 and accept (Telegram intake wiring + acceptance tests) pending since 2026-06-05 behind the gateway-restart approval (viewport-os#2 still open, 0 comments); Hermes mount apply (P0-1) parked as draft packet; secret rotation (P0-2) zero rotations, no owners; council Round 001 never started; the 5 automation prerequisites of the Origin Audit — durable task intake, shared context, authority grant, verification loop, memory writeback — "zero of five exist."

Section 4

Gap analysis: why markdown + issues + prompts wasn't enough

The corpus itself proves each structural gap — these are observations, not theory.

No enforced read.

Every bootstrap contract (HANDOFF.md, agent_handoff_pack, LIVE_HANDOFF.md, council AGENTS.md, agent-entry-protocol) is a write-side convention with no read-side gate. Measured compliance of voluntary reads: STATE.md never advanced past round 000; tracker.json 1 event; LIVE_HANDOFF stale in 24h. Forensics lesson, verbatim: "state files no one is FORCED to read are dead state (voluntary convention = measured 0% compliance)." Worse, when a read was enforced (OpenClaw injection), the file exceeded the injection limit and was silently truncated instead of failing loudly.

No lease / no owner.

Issues #2, #3, #187, #191, #192, #195 all rotted unowned (no assignee, no heartbeat, no expiry). Two-owner coordination issue #2: one agent went silent for 3 weeks and nothing reclaimed the work. The kill-cron hard-killed owners and their tasks never returned to a claimable state. The lease schema exists (task-lease.schema.yaml, task-ledger-and-fallback-policy.yaml with lease_owner/lease_expires_at) but T-029 records flatly: "schema exists but board does NOT enforce it."

No single-task constraint.

Task state was a 139-row board + 4 flat HANDOFF bullets + 3 status.json active_tasks + 24 open issues + "today_focus" — at least five competing "what should I do" surfaces, none authoritative. Result: duplicates (#193/#194, #179–#181, PR #83 vs #56), mega-issues (#195's 7-item checklist), and wrong-task pickup. The one proven counterexample — tasks/current-active-task.yaml + active_issue singular in the CI-validated queue (PRs #197–#205) — produced the only week where work actually chained.

No scope fence.

Forbidden lists existed as prose (do_not_touch, protected boundaries, stop lists) but with no mechanical check; the May-31 OTP incident (#133) and the "fresh rebuild" that deleted 47 crons both happened inside sessions that had the prose available. The only scope mechanisms that held were structural ones: staging containers with zero published ports + traefik.enable=false labels (no-hostname parity doc), iptables DROP on port 3000, and Dokploy's org-context refusal — enforcement in the substrate, not in the prompt.

State scattered across N places, on the wrong branches.

Three generations of agent OS in one repo; company-os/ vs companyos/ vs viewport-company-os/ vs Migration/ vs migration/; default branch council/bootstrap-20260510 with no main; canonical files only on feature branches; secrets register copies on ~19 branches; the validator forked into two versions; repo-name inversion (viewport-os stub vs viewport-ops real). Any session resolving "the plan" had multiple stale candidates and frequently 404'd the canonical one.

No write-back obligation.

Done-claims required no committed proof: checkboxes ticked without evidence (#195); "P0 complete" from an unpushed branch; "updated" meaning "pulled" (#88); simulated acceptance tests claimed as pass while the live gateway never loaded the plugin (viewport-os#1/#2). tracker.json was never appended after bootstrap; nightly brain writeback never ran; handoffs were written only at completion, so killed sessions wrote nothing. The Origin Audit's loop verdict: issue→branch→artifact→evidence→PR→merge→close completed zero times outside the #196 stretch.

Out-of-band killers and broken plumbing under the contract layer.

No contract can survive a root cron pkill-ing every agent every 6h, a 403'd push token, a PAT absent from the exec env, or sandboxes that can't write files. The transcript's root-cause confession (2026-05-28) generalizes it: "Telegram as control plane with no durable queue, no fallback providers, no health watchers, no automatic recovery."

Reports mistaken for the work.

The corpus codified this itself: not_done_claim "Website/status reporting is not the migration"; Origin Audit failure loop #3 "report mistaken for completion → Sam believes progress made"; architecture 9/10, execution 1/10.

Proven counterexample · build on this

The one pattern that worked

Issue #196 + PRs #197–#205 (2026-06-05) — the only stretch where the closure loop demonstrably closed, and the only week work actually chained. Why it worked:

A single committed active-task pointer — tasks/current-active-task.yaml with active_issue singular: one task, one issue, one branch, no competing surfaces.
A hard-fail CI validator — validate_company_os.py failed loudly when state artifacts were missing or empty, instead of trusting agents to volunteer compliance.
A fixed reporting contract — Phase / Task / Done / Proof / Blocker / Next on every update, used verbatim in all 7 issue-#196 comments, each with a "Safety boundary" negative list of mutations NOT performed.
Next-task-from-ledger — the next unit of work came from the committed ledger, not from chat memory or whichever surface a session happened to read.

issue#196

one active

branch

pushed

artifactcurrent-active-task.yaml

CI hard-fail

evidencevalidate_company_os.py

proof contract

PR#197–#205

merged

merge

closed

It did so precisely because the validator failed loudly — enforcement in the pipeline, not a request in prose. This is the direct ancestor of the ACTIVE_TASK.json bootstrap.

issue #196 PRs #197–#205 validate_company_os.py · hard-fail current-active-task.yaml · singular Phase/Task/Done/Proof/Blocker/Next

Section 5

What exists already that the ACTIVE_TASK.json / lease bootstrap can build on

Concrete artifacts, committed and live — plus the actual gaps still to close.

5.1 Concrete artifacts, committed and live

Artifact	Where	What it gives the bootstrap
`tasks/current-active-task.yaml` (PR #201, merged into the ops/* stack)	viewport-ops, branch `ops/openclaw-github-flow-44` lineage	Direct ACTIVE_TASK.json ancestor: id, title, status, phase, issue, branch, owner, approval_required, mutation_class, why_first, acceptance[], proof_required[] (exact commands + curl markers), blocked_by[], next_after_done. Missing only lease TTL/heartbeat.
`companyos/runtime/task-ledger-and-fallback-policy.yaml`	viewport-ops default branch	lease_owner, lease_expires_at, policy_version_git_sha, last_checkpoint_ref, resume_instruction; split-brain rules; 8-step fallback takeover protocol. Field names ready to lift.
`viewport-company-os/tasks/task-lease.schema.yaml` + `task-packet.schema.yaml`	branch `fix/migration-public-pages-and-audit-routes` (+ siblings)	Partial lease/packet schemas — T-115 already lists their gaps (reviewer/verifier/tests/rollback/heartbeat/takeover/backup seat).
`validate_company_os.py`	two feature branches (two versions)	Working pattern of hard-fail CI on missing/empty state artifacts incl. `active_issue` singular check, readiness state machine, evidence-path existence-on-disk checks, red/green activation proof.
`plans/migration-execution-ledger.yaml` + `migration-phases.yaml`	branch `ops/migration-execution-ledger`	Exactly-one current_task bound to issue+branch; phase order; approval split (5 Sam-gates vs 4 standing-safe); plain-English update contract; not_done_until[].
Plain-English status contract	issue #196 comments, status.json	Phase / Task / Done / Proof / Blocker / Next / Status-URL — the proof/blocker/next-step trio Sam wants in ACTIVE_TASK.json, already in production use.
`/migration/status.json`	live	Machine bootstrap surface any fresh session can curl; ui_contract; agent_handoff_pack; structured blockers {id, status, fact, unsafe_without_approval}.
HANDOFF.md	viewport-os main	The write-side handoff convention (needs an enforced read + freshness field).
`/migration/task` board + task.json plan (T-081)	live	The backlog layer beneath the single active task; column taxonomy (NOW/NEXT/BLOCKED/WATCH) and Use rules.
Issue label state machine + entry protocol	root AGENTS.md, docs/agent-entry-protocol.md	`state:active` as work selector; "must not write files" rule; PR-must-name-one-issue contract.
Issue templates 01–05 incl. 02-task-packet.yml	.github/ISSUE_TEMPLATE	Intake/packet/incident/runtime-change forms already defined.
Council triple-state	`Migration/council/{STATE.md, TASK.md, tracker.json, rounds/, handoff/template.md}`	Append-only event log + mutable state + handoff block with verdict enum and git{branch, sha, pushed_to_remote}; TASK.md = existing single-task + allowed/forbidden Markdown contract.
GSD/Ralph contract + active queue + activation proof	merged via PR #197	The 8-step loop, source-of-truth order, max-3-attempts rule, stop_for_sam vs standing_approval split — CI-enforced once.
RuntimeContracts (72) + authority matrix + 48-seat registry + enforcement-gate spec	PR #204/#205 artifacts	The scope-fence raw material: per-container owner/tenant/repo/route/backup/rollback/approval class; per-seat allowed/forbidden.
File-based request queue	`migration-control-plane/openclaw-requests/pending/` → `completed/` (commits `40444bf`…`facb874`)	Proven "file move + commit = state transition" primitive (transcript P8).
Evidence-run contract + redaction pipeline	`evidence/agent-runs/`…, redaction-report.json (8-type classifier)	Evidence bundle shape + the scrubber the write-back path must pass through.
Adoption-packet pattern	PRs #75/#81/#82/#83/#84	Phase-gated plans with stop conditions, before-AND-after verification, named backup families, terminal dispositions (superseded/not-now distinct from done/blocked).
`.claude/state/current-task.json`, changes-log.jsonl, HANDOFF.md, QUEUE.md	mlh-clients-portal repo	Sam has built current-task.json before — audit's own note: consolidate, don't add a third variant.
Modern/CLAUDE.md SESSION BOOTSTRAP	local Mac	Proven in-house enforced-read + end-of-session protocol with documented rationale.
Hermes Kanban subsystem	fork-hermes-agent (upstream)	Production-hardened reference implementation of the whole design: 9-status state machine, atomic CAS claim with claim_lock/claim_expires (15-min TTL), heartbeats, 3-tier stale-lease recovery (each tier fixed a numbered production bug), TASK-vs-RUN separation, protocol-violation = crashed, circuit breaker, sticky vs auto-recovering blocks, force-injected 6-step bootstrap in the worker system prompt, single-task pinning via env, structured complete/block contracts, anti-hallucination completion gate.
Closed Operating Loop v0.1 design	viewport-kb reports	The fullest paper spec: task packet with out_of_scope, lease file with git-conflict race detection, mandatory ordered bootstrap, authority policy.yaml, evidence bundle schema, memory tiering, 2 watchers, 7-day plan TP-01..TP-10, hard-stop list.
Issue #213 "Build Viewport closed operating loop v0.1"	viewport-ops, open	The designated fix vehicle; acceptance already includes "Active lease file/table exists" and one safe test task closed end-to-end with memory writeback.

5.2 What's missing (the actual gaps to close)

No ACTIVE_TASK file or lease exists on any live page or the default branch. current-active-task.yaml lives on a non-default branch; the lease file (active-leases.json) is designed, unbuilt; status.json carries 3 active_tasks (plural) with no lease/TTL.
No enforced read anywhere — no hook/wrapper/CI step that blocks a session that hasn't loaded the canonical state. (The agent-entry-protocol's "must not write files" rule has no mechanical teeth.)
No heartbeat/TTL/takeover in any committed schema actually in use; no dedup/CAS on task creation or claim.
No single canonical path/branch: the bootstrap must pin repo + branch + path and hard-fail (never stub-fallback — see the PyYAML silent-degradation bug Codex flagged on PR #199, and never truncate — see the 18,870-char AGENTS.md).
No write-back gate: nothing rejects a session end without an updated state file (loop invariant exists only on paper); handoff written at completion only, not incrementally.
Out-of-band blockers still live: kill-cron (per forensics, still present at audit), PAT unrevoked/un-scoped and absent from exec env, push 403, gateway restart unapproved, sandbox write restrictions for subagents.

Section 6 · decisions awaiting Sam

16 open questions for Sam

Ambiguities the corpus cannot resolve — each needs a decision before/while building the bootstrap.

01Which artifact is THE canonical active-task state?

At least four candidates exist or are designed: tasks/current-active-task.yaml (PR #201), the single state:active GitHub issue (entry protocol; currently #15), the v0.1 active-leases.json + state:active issue combo, and the new ACTIVE_TASK.json. The audit explicitly warns against adding "a third variant" next to mlh-clients-portal's current-task.json. Pick one; declare the rest derived or retired.

Awaiting Sam

02Which repo + branch is canonical?

viewport-ops default is council/bootstrap-20260510; the live pages build from ops/openclaw-github-flow-44; the execution ledger names ops/finish-migration-p0-foundation; GSD names ops/gsd-ralphloop-githubops-runtime — three-plus "current" branches referenced simultaneously, and viewport-os vs viewport-ops naming is inverted. Does Sam want a main created/promoted, a monorepo consolidation (restart plan says viewport-os monorepo), or the bootstrap hard-pinned to the existing default?

Awaiting Sam

03The PAT decision (29-day blocker).

Revoke the leaked admin:enterprise PAT and mint scoped credentials (audit S01/S12 and restart both say GitHub App auth) — but rotation of live Telegram/CF/Odoo tokens is on the do_not_rotate_without_approval list. What's the rotation order and window, and does pat_revoked:true get flipped in STATE.md or does STATE.md retire?

Awaiting Sam

04Kill-cron and gateway restart.

Forensics says remove /etc/cron.d/claude-cleanup and one gateway restart unblocks setup4/accept/intake (viewport-os#2) — but both are behind Sam's own approval gates and the mount packet recommends "do NOT restart." Approve a restart window (Option A) or accept Option C (GitHub as canonical evidence path, no restart) permanently?

Awaiting Sam

05Container-count truth.

66 vs 72 vs 73 in the system's own surfaces. Which probe is canonical, and should the bootstrap refuse to run when its state file disagrees with the live probe?

Awaiting Sam

06Where does the lease live?

v0.1 design = committed JSON file with git-merge-conflict as the race detector; task-ledger policy = YAML ledger fields; Hermes Kanban = SQLite CAS with heartbeats. Git-file leases are auditable but slow and conflict-prone for sub-minute claims; SQLite is fast but off-GitHub. Which trade-off — and is GitHub-as-lease acceptable given the push path was 403'd for weeks?

Awaiting Sam

07Lease TTL/heartbeat semantics under Sam's runtimes.

Upstream Hermes learned (bugs #23025/#29747) that long tool-free LLM calls can't heartbeat — naive TTL reclaim creates spawn-then-reclaim loops. What TTL, and does a wrapper-level heartbeat run out-of-band?

Awaiting Sam

08Who may write ACTIVE_TASK.json?

v0.1 says verifier ≠ executor and specialists may not close issues; sandboxed subagents demonstrably cannot write files at all. Is the rule "canonical state writes only from the main/host session" formalized?

Awaiting Sam

09Scope fence granularity.

Sam wants allowed repo/path/tools + explicit forbidden scope in the task file. The corpus has three fence vocabularies: RuntimeContract approval classes, risk tiers 0–4, and per-seat authority matrix flags. Which one does ACTIVE_TASK.json reference so fences stay consistent with the 72 contracts and 48-seat registry already committed?

Awaiting Sam

10Fate of the 24 open issues / 139-task board / 11 open AUDIT-FIND issues.

One-task-only bootstrap implies everything else is backlog. Bulk-triage with the state:* labels (and who owns triage — Hermes, a cron, Sam)? AUDIT-FINDs are findings records, not work tickets — do they feed a triage queue or get a dedicated consumer?

Awaiting Sam

11Hermes mount vs GitHub-only evidence.

Approval packet options A/B/C are still open ("current_decision: No apply"). If C (GitHub canonical) is permanent, the /srv/viewport/migration evidence tree and 90-day private-evidence retention policy need an owner and sync story.

Awaiting Sam

12Model/runtime references in state.

STATE.md froze on next_agent: claude-opus-4.7 (retired) and old config carried dead model fallbacks. Should ACTIVE_TASK.json ban model IDs entirely (seat names only) or validate against a live registry?

Awaiting Sam

13Auth economics for the agent fleet.

The restart plan mandates API keys for all agent runtimes (citing the Anthropic OAuth policy change, June 15 2026 credit pool) and Claude Max interactive-only — this interacts with Sam's locked subscription-NEVER-API rule and the 2026-06-22 Fable 5 deadline. Which workloads, if any, move to paid API keys?

Awaiting Sam

14The wrapper's failure mode.

When ACTIVE_TASK.json is missing/stale/unparseable: hard-stop the session (forensics lesson: fail LOUDLY), or auto-create a diagnose-the-loop task (v0.1: "the session's first job = diagnose why the loop broke")? And what is "stale" — a TTL field, last-heartbeat, or git mtime?

Awaiting Sam

15Singular vs per-tenant active tasks.

"One task only" globally, or one per tenant/runtime (Hermes viewport, hermes-bccl, MLG/MLH lanes)? The v0.1 lease design forbids two active leases sharing tenant+domain — which implies N concurrent leases, not 1.

Awaiting Sam

16Postiz and other phantom references.

"MLH Postiz Automation Handoff" repo not found anywhere (T-106) — does it exist privately, or should the board entry be retired? Same question for every phantom path catalogued in §3.2.

Awaiting Sam

Appendix

Evidence ledger — key numbers, one line each

Every headline number on this page, in one place, sourced from the corpus.

Tasks: 139 total / NOW 58 / NEXT 73 / BLOCKED 4 / DONE 1; closure loop completed 0×; ledger drift 139/33/58 vs 127/36/52; NOW evidence ratio ~70.7% (text only).
Audit 2026-06-05: 13 sections, PASS 2 FAIL 10 UNKNOWN 1; 36 evidence files; 1,187 redactions; remediation issues #182–#191 all open/unowned.
Sessions: 488–522 in 30 days; 29,009–34,330 messages; all cold starts; tracker.json = 1 event; council at round 000 for 29 days on pat_revoked:false.
Runtime: 66-vs-72/65/3 container conflict; 3 unhealthy 4–5 weeks; 56 no-repo; 50 no-domain; 49 ghosts; 21/61 ghost CF zones; 118 Dokploy projects; dual orchestrators.
Agents/crons: 24 vs 25 vs 26 seats; 50 → 47 → 1 cron; 48-seat registry all seed_only; kill-cron every 6h as root.
Secrets: session DB sk- ×970/978, TELEGRAM ×276, ghp_ ×138, CF ×61, AIza ×32, xoxb ×14; register categories openai 179 / generic 99 / passwords 29 / IPs 854; 0 rotations, 0 owners; admin:enterprise PAT unrevoked since 2026-05-10.
Bootstrap pathology: AGENTS.md 18,870 chars vs 12,000-char injection limit (silent truncation); branch ahead 6 commits unpushed while pages claimed "P0 complete"; push 403 at commit fbd75bc; gh CLI missing; ≥6 phantom canonical paths.
The one proven pattern: issue #196 + PRs #197–#205 — single committed active-task pointer + hard-fail validator + fixed Plain-English/Proof/Blocker/Next contract + next-task-from-ledger = the only week the loop closed.

Claude Deep ResearchWhy GitHub Ops Failed as an Agent OS

Research scope

What this page is, in plain English

The failure, in eight numbers

Audit 2026-06-05 · 13 sections

Task board · 139 tasks

AGENTS.md vs injection limit

Session volume · 30 days

5 smoking guns

What the migration system actually is today

1.1 The published surface (viewport.llc/migration)

1.2 Repos (audit Section 1)

1.3 The status.json contract

1.4 Hermes and the runtime

1.5 Task board and execution state

1.6 Audit state (2026-06-05)

1.7 Secrets and approval gates

1.8 What actually works

The intended agent operating model

2.1 Doctrine and bootstrap-read contract

2.2 Issue/PR machinery

2.3 The execution loop: GSD + RalphLoop

2.4 Evidence rules

2.5 Governance: authority, risk, approvals

2.6 Designed-but-unbuilt: the closed loop v0.1 and task/lease schemas

Where it actually broke

A Context loss — sessions start cold, plans evaporate

B Wrong-task pickup — nothing binds a session to one task

C Harness drift — rules existed as documents, not gates

D State stale, false, or unreachable

Gap analysis: why markdown + issues + prompts wasn't enough

The one pattern that worked

What exists already that the ACTIVE_TASK.json / lease bootstrap can build on

5.1 Concrete artifacts, committed and live

5.2 What's missing (the actual gaps to close)

16 open questions for Sam

Evidence ledger — key numbers, one line each

Claude Deep Research
Why GitHub Ops Failed as an Agent OS