engine/diagnostics
PURPOSE — Boundary-hardening and live-observability for two known production failure modes: the Canvas 2D gradient API rejecting non-finite arguments mid-frame, and the PWA “stuck zoomed in” multi-layer drift bug. Wraps the Canvas API to substitute safe values and report the first occurrences with payload; samples every zoom-relevant layer once per second plus on suspect DOM events, draws a live on-screen overlay, and exposes a nuclear recovery routine that resets every layer.
OWNS
- The canvas-guard install latch, per-session hit counter, per-session report counter, and dedupe set of stack signatures.
- The wrapped
createRadialGradient/createLinearGradientprototype methods on the 2D context and the offscreen 2D context, with a guarded-marker flag attached to each wrapper so re-installation is a no-op. - The safe-value substitution policy for gradient arguments: per-arg coordinate coercion, per-arg non-negative radius coercion with a minimum, and the inner-vs-outer-radius ordering invariant.
- The Sentry-quota cap for full-payload reports and the rule that further hits still increment the counter but do not report.
- The zoom-watcher baseline snapshot, the last-snapshot cache, the one-shot warned latch, the sampling-timer handle, and the overlay-visible flag.
- The zoom-snapshot record shape, covering visual viewport, window, document element, root element, canvas buffer / CSS / transform, engine camera, page scroll, and screen / orientation layers.
- The drift-detection tolerances and the set of conditions that count as drift (scale delta, body-scrolled, root or canvas inline transform applied, canvas-buffer-to-CSS ratio change, devicePixelRatio change, canvas CSS wider than the inner viewport, engine camera zoom above the watcher’s ceiling, document element wider than the inner viewport).
- The on-screen overlay DOM node, its style, header colour and border, and its multi-line readout.
- The auto-show rule that surfaces the overlay the moment drift is first detected.
- The recovery routine’s ordered steps: viewport-meta unlock-then-relock across a frame, force-scroll-to-origin, inline-transform wipe on root and canvas, engine camera zoom-and-target-zoom reset, synthetic resize dispatch, baseline refresh, and the human-readable summary string.
READS FROM
engine/telemetrylogDiagfor the canvas-guard error event.engine/corecameraandgamefor engine camera zoom / target zoom and the current game phase included in zoom snapshots and used by the recovery reset.@sentry/browser(dynamically imported) for the zoom drift event, the zoom-watcher breadcrumbs, and the on-demand manual dump.window.visualViewport(when present) for scale, dimensions, offset, and the resize / scroll events used as sample triggers.windowanddocumentstandard surfaces:innerWidth,innerHeight,devicePixelRatio,scrollX,scrollY,screenorientation / dimensions,matchMediafor the standalone-PWA tag,document.documentElementclient dimensions, the canvas element, the#rootelement, and their computed styles.- The viewport meta tag for the unlock-then-relock recovery step.
- The dev-bridge object on
window(when populated) for the per-frame game-state snapshot included in canvas-guard payloads.
PUSHES TO
engine/telemetryvialogDiag(canvas-guard hit with method, raw args, substituted args, stack, viewport snapshot, and counters).- Sentry via the dynamic import: zoom-drift warning with baseline and current snapshots, zoom-watcher breadcrumbs tagged per trigger event, and the on-demand zoom manual-dump warning.
- The DOM: a fixed-position overlay element appended to the body, inline-style updates to the canvas / root elements during recovery, and a synthetic resize event dispatched on the window during recovery.
- The engine camera: zoom and target zoom set to the neutral value during recovery.
- The viewport meta tag content attribute during the unlock-then-relock recovery step.
windowas a debug surface: a read-only handle to the canvas-guard counter is published so it can be inspected during a play test.
DOES NOT
- Decide which gradient call was wrong, fix the upstream producer of non-finite values, or unwrap itself after the first hit — every subsequent gradient call still flows through the wrapper.
- Throw on bad gradient arguments. The boundary is hardened by substitution, not by propagating the error.
- Report every gradient hit. Reports are capped per session and deduped by stack signature; further hits silently increment the counter only.
- Read or import live engine state directly inside the canvas guard. State is sniffed off the dev-bridge object when present and the snapshot is “unknown” otherwise.
- Drive the game loop, the render schedule, or the bridge frame-error catch path. The guards run synchronously inside the wrapped Canvas methods; the zoom watcher runs on its own interval and event listeners.
- Diagnose any zoom layer beyond the enumerated set, or attempt partial-layer recovery. The recovery is all-or-nothing.
- Capture a baseline before the first layout pass — the baseline is taken after a deliberate delay so the initial layout has settled.
- Report zoom drift more than once per session. The first drift carries enough payload; later samples only update the overlay.
- Continue reporting after the recovery routine runs. Recovery refreshes the baseline and clears the warned latch so post-recovery drift can re-report once.
- Own any UI for the recovery button, any keybinding for the overlay toggle, or any policy on when the watcher should start. The bridge / harness wires those in.
- Tear itself down. Once started, the watcher’s timer and event listeners run for the lifetime of the page.
Signals fired / Signals watched — none. The module talks to the rest of the engine by direct function calls (telemetry, Sentry, camera writes, DOM mutations); it does not emit or subscribe to engine signals.
Entry points
installCanvasGradientGuards— one-shot install of the gradient wrappers on the 2D and offscreen 2D prototypes, idempotent, no-op in non-browser environments. Also publishes the read-only counter handle onwindow.CanvasGuardStats— exported counter object with total hits and reports-so-far fields; surfaced for devtools inspection.startZoomWatcher— starts the sampling interval, captures the baseline after a delay, builds the overlay, and registers the trigger-event listeners on window, document, and visual viewport.toggleZoomOverlay— shows or hides the live overlay and immediately refreshes its contents from the last cached snapshot; returns the new visibility.dumpZoomStateToSentry— on-demand drift-agnostic snapshot report with a caller-supplied reason tag, used by the recovery button.resetAllZoomLayers— the nuclear recovery routine; resets every tracked layer, refreshes the baseline, clears the warned latch, and returns a one-line before-and-after summary.
Pattern notes
- The two surfaces here are deliberately scoped at external boundaries (the Canvas 2D API and the host page’s zoom / viewport stack) — the rest of the engine still crashes on bad data; the guards exist because those specific producers have been observed to feed invalid values from outside the engine’s invariants.
- The canvas guard installs by overwriting prototype methods rather than wrapping per-call sites, so every present and future caller is covered without changes elsewhere. The guard marks each wrapper with a flag and refuses to wrap a marker’d method, making the install idempotent across hot reloads.
- Bad-input reporting uses a stack-signature dedupe set so a per-frame loop that hits the guard does not multiply-report, and a hard cap on full-payload reports so the Sentry quota survives a runaway producer. The total-hits counter keeps incrementing regardless so devtools inspection still reflects reality.
- The canvas-guard payload includes the original raw arguments alongside the substituted values plus a best-effort engine-state snapshot read off the dev-bridge object — the guard avoids importing engine state directly to dodge circular dependencies.
- The zoom watcher samples every zoom-relevant layer the bug could be hiding behind in a single snapshot record. The drift detector compares the latest snapshot against the baseline using per-layer tolerances calibrated to ignore mobile-URL-bar show / hide noise and sub-pixel rounding.
- Drift detection auto-promotes the overlay from hidden to visible the first time drift is detected, so a later screenshot captures the live numbers without requiring the user to toggle anything.
- Sampling runs on both a fixed interval and a set of trigger DOM events — the trigger path also adds a Sentry breadcrumb so a later report can reconstruct what happened in the last few seconds before the steady-state.
- Sentry is imported dynamically and every Sentry call is wrapped so the watcher is still useful when Sentry is unavailable — the overlay path keeps working.
- Recovery touches every layer the watcher tracks, in a fixed order, because partial fixes have been observed to leave one layer stuck. The viewport-meta step deliberately transitions across a frame because a plain wipe-and-restore is known to be a no-op on stuck mobile visual viewports; the synthetic resize dispatch is what re-runs the engine’s own canvas-resize handler.
- Recovery refreshes the baseline and clears the one-shot report latch so the next drift after recovery is treated as a fresh first occurrence.