engine/diagnostics

PURPOSE — Boundary-hardening and live-observability for two known production failure modes: the Canvas 2D gradient API rejecting non-finite arguments mid-frame, and the PWA “stuck zoomed in” multi-layer drift bug. Wraps the Canvas API to substitute safe values and report the first occurrences with payload; samples every zoom-relevant layer once per second plus on suspect DOM events, draws a live on-screen overlay, and exposes a nuclear recovery routine that resets every layer.

OWNS

  • The canvas-guard install latch, per-session hit counter, per-session report counter, and dedupe set of stack signatures.
  • The wrapped createRadialGradient / createLinearGradient prototype methods on the 2D context and the offscreen 2D context, with a guarded-marker flag attached to each wrapper so re-installation is a no-op.
  • The safe-value substitution policy for gradient arguments: per-arg coordinate coercion, per-arg non-negative radius coercion with a minimum, and the inner-vs-outer-radius ordering invariant.
  • The Sentry-quota cap for full-payload reports and the rule that further hits still increment the counter but do not report.
  • The zoom-watcher baseline snapshot, the last-snapshot cache, the one-shot warned latch, the sampling-timer handle, and the overlay-visible flag.
  • The zoom-snapshot record shape, covering visual viewport, window, document element, root element, canvas buffer / CSS / transform, engine camera, page scroll, and screen / orientation layers.
  • The drift-detection tolerances and the set of conditions that count as drift (scale delta, body-scrolled, root or canvas inline transform applied, canvas-buffer-to-CSS ratio change, devicePixelRatio change, canvas CSS wider than the inner viewport, engine camera zoom above the watcher’s ceiling, document element wider than the inner viewport).
  • The on-screen overlay DOM node, its style, header colour and border, and its multi-line readout.
  • The auto-show rule that surfaces the overlay the moment drift is first detected.
  • The recovery routine’s ordered steps: viewport-meta unlock-then-relock across a frame, force-scroll-to-origin, inline-transform wipe on root and canvas, engine camera zoom-and-target-zoom reset, synthetic resize dispatch, baseline refresh, and the human-readable summary string.

READS FROM

  • engine/telemetry logDiag for the canvas-guard error event.
  • engine/core camera and game for engine camera zoom / target zoom and the current game phase included in zoom snapshots and used by the recovery reset.
  • @sentry/browser (dynamically imported) for the zoom drift event, the zoom-watcher breadcrumbs, and the on-demand manual dump.
  • window.visualViewport (when present) for scale, dimensions, offset, and the resize / scroll events used as sample triggers.
  • window and document standard surfaces: innerWidth, innerHeight, devicePixelRatio, scrollX, scrollY, screen orientation / dimensions, matchMedia for the standalone-PWA tag, document.documentElement client dimensions, the canvas element, the #root element, and their computed styles.
  • The viewport meta tag for the unlock-then-relock recovery step.
  • The dev-bridge object on window (when populated) for the per-frame game-state snapshot included in canvas-guard payloads.

PUSHES TO

  • engine/telemetry via logDiag (canvas-guard hit with method, raw args, substituted args, stack, viewport snapshot, and counters).
  • Sentry via the dynamic import: zoom-drift warning with baseline and current snapshots, zoom-watcher breadcrumbs tagged per trigger event, and the on-demand zoom manual-dump warning.
  • The DOM: a fixed-position overlay element appended to the body, inline-style updates to the canvas / root elements during recovery, and a synthetic resize event dispatched on the window during recovery.
  • The engine camera: zoom and target zoom set to the neutral value during recovery.
  • The viewport meta tag content attribute during the unlock-then-relock recovery step.
  • window as a debug surface: a read-only handle to the canvas-guard counter is published so it can be inspected during a play test.

DOES NOT

  • Decide which gradient call was wrong, fix the upstream producer of non-finite values, or unwrap itself after the first hit — every subsequent gradient call still flows through the wrapper.
  • Throw on bad gradient arguments. The boundary is hardened by substitution, not by propagating the error.
  • Report every gradient hit. Reports are capped per session and deduped by stack signature; further hits silently increment the counter only.
  • Read or import live engine state directly inside the canvas guard. State is sniffed off the dev-bridge object when present and the snapshot is “unknown” otherwise.
  • Drive the game loop, the render schedule, or the bridge frame-error catch path. The guards run synchronously inside the wrapped Canvas methods; the zoom watcher runs on its own interval and event listeners.
  • Diagnose any zoom layer beyond the enumerated set, or attempt partial-layer recovery. The recovery is all-or-nothing.
  • Capture a baseline before the first layout pass — the baseline is taken after a deliberate delay so the initial layout has settled.
  • Report zoom drift more than once per session. The first drift carries enough payload; later samples only update the overlay.
  • Continue reporting after the recovery routine runs. Recovery refreshes the baseline and clears the warned latch so post-recovery drift can re-report once.
  • Own any UI for the recovery button, any keybinding for the overlay toggle, or any policy on when the watcher should start. The bridge / harness wires those in.
  • Tear itself down. Once started, the watcher’s timer and event listeners run for the lifetime of the page.

Signals fired / Signals watched — none. The module talks to the rest of the engine by direct function calls (telemetry, Sentry, camera writes, DOM mutations); it does not emit or subscribe to engine signals.

Entry points

  • installCanvasGradientGuards — one-shot install of the gradient wrappers on the 2D and offscreen 2D prototypes, idempotent, no-op in non-browser environments. Also publishes the read-only counter handle on window.
  • CanvasGuardStats — exported counter object with total hits and reports-so-far fields; surfaced for devtools inspection.
  • startZoomWatcher — starts the sampling interval, captures the baseline after a delay, builds the overlay, and registers the trigger-event listeners on window, document, and visual viewport.
  • toggleZoomOverlay — shows or hides the live overlay and immediately refreshes its contents from the last cached snapshot; returns the new visibility.
  • dumpZoomStateToSentry — on-demand drift-agnostic snapshot report with a caller-supplied reason tag, used by the recovery button.
  • resetAllZoomLayers — the nuclear recovery routine; resets every tracked layer, refreshes the baseline, clears the warned latch, and returns a one-line before-and-after summary.

Pattern notes

  • The two surfaces here are deliberately scoped at external boundaries (the Canvas 2D API and the host page’s zoom / viewport stack) — the rest of the engine still crashes on bad data; the guards exist because those specific producers have been observed to feed invalid values from outside the engine’s invariants.
  • The canvas guard installs by overwriting prototype methods rather than wrapping per-call sites, so every present and future caller is covered without changes elsewhere. The guard marks each wrapper with a flag and refuses to wrap a marker’d method, making the install idempotent across hot reloads.
  • Bad-input reporting uses a stack-signature dedupe set so a per-frame loop that hits the guard does not multiply-report, and a hard cap on full-payload reports so the Sentry quota survives a runaway producer. The total-hits counter keeps incrementing regardless so devtools inspection still reflects reality.
  • The canvas-guard payload includes the original raw arguments alongside the substituted values plus a best-effort engine-state snapshot read off the dev-bridge object — the guard avoids importing engine state directly to dodge circular dependencies.
  • The zoom watcher samples every zoom-relevant layer the bug could be hiding behind in a single snapshot record. The drift detector compares the latest snapshot against the baseline using per-layer tolerances calibrated to ignore mobile-URL-bar show / hide noise and sub-pixel rounding.
  • Drift detection auto-promotes the overlay from hidden to visible the first time drift is detected, so a later screenshot captures the live numbers without requiring the user to toggle anything.
  • Sampling runs on both a fixed interval and a set of trigger DOM events — the trigger path also adds a Sentry breadcrumb so a later report can reconstruct what happened in the last few seconds before the steady-state.
  • Sentry is imported dynamically and every Sentry call is wrapped so the watcher is still useful when Sentry is unavailable — the overlay path keeps working.
  • Recovery touches every layer the watcher tracks, in a fixed order, because partial fixes have been observed to leave one layer stuck. The viewport-meta step deliberately transitions across a frame because a plain wipe-and-restore is known to be a no-op on stuck mobile visual viewports; the synthetic resize dispatch is what re-runs the engine’s own canvas-resize handler.
  • Recovery refreshes the baseline and clears the one-shot report latch so the next drift after recovery is treated as a fresh first occurrence.