Elevated 5xx errors at the API edge
A deploy shipped with a connection-pool ceiling sized for staging, not production. Under normal evening traffic the pool exhausted within minutes and the edge returned 502s for roughly a third of requests. We rolled back in 9 minutes; the remaining time was queue drain. The fix was boring and real: pool limits are now part of config review, and a synthetic load check runs against every release candidate before it can promote. No data was lost and no runs were double-executed.