Status — Mynd Labs

Platform — Status/status

Six services, probed from outside, measured over 90 days. When something breaks, the postmortem is published here.

ALL SYSTEMS OPERATIONAL90-day window — updated daily

[ 01 ]Services — uptime, 90 days

operational planned dip incident

Y0 Runtime

run execution, scheduling, traces

99.96%

2026-03-132026-06-10

Context Graph

memory, retrieval, indexing

99.96%

2026-03-132026-06-10

Trust Kernel

permissions, scopes, audit log

99.98%

2026-03-132026-06-10

Gateway

API edge, auth, rate limiting

99.96%

2026-03-132026-06-10

Analytics

usage, cost and quality reporting

99.98%

2026-03-132026-06-10

Billing

metering, invoicing, receipts

99.98%

2026-03-132026-06-10

[ 02 ]Incident history

Three incidents in the window. Each one is written up honestly — detection time included, even when it embarrasses us.

[INC-003]2026-05-28Gateway38 min✓ resolved

Elevated 5xx errors at the API edge

A deploy shipped with a connection-pool ceiling sized for staging, not production. Under normal evening traffic the pool exhausted within minutes and the edge returned 502s for roughly a third of requests. We rolled back in 9 minutes; the remaining time was queue drain. The fix was boring and real: pool limits are now part of config review, and a synthetic load check runs against every release candidate before it can promote. No data was lost and no runs were double-executed.

[INC-002]2026-04-22Context Graph1h 12m✓ resolved

Slow retrieval during index rebuild

A scheduled index rebuild took a write lock that our read path was not supposed to wait on — but a regression three weeks earlier had quietly re-coupled them. Retrieval latency climbed from ~80ms to multi-second for just over an hour. We aborted the rebuild, restored read throughput, and re-ran the rebuild against a shadow index overnight. The regression had shipped without a test; that test now exists, and rebuilds run shadow-first as standard practice.

[INC-001]2026-03-15Y0 Runtime26 min✓ resolved

Run queue stall from a bad scheduler canary

A canary build of the run scheduler deadlocked on a rare retry path and the canary slice of the queue stopped draining. Detection took longer than it should have — 11 minutes — because our alert measured throughput, not queue age. We killed the canary, the queue drained, and every stalled run completed without manual intervention. Two changes followed: queue-age alerting with a 90-second threshold, and canaries now auto-revert on stall instead of waiting for a human.

[ 03 ]Stay informed

Subscribe to updates and we'll tell you when something breaks — and what we did about it.

Subscribe to updates

[ note ]incidents only — no marketing

SYSTEMSTATUS

Y0 Runtime

Context Graph

Trust Kernel

Gateway

Analytics

Billing

What broke,and what changed.

Elevated 5xx errors at the API edge

Slow retrieval during index rebuild

Run queue stall from a bad scheduler canary