This incident has been mitigated. The incident report is as below:
During this timeframe: - All calls to create / delete sandboxes, routegroups, jobs and JRGs were failing. - Updates to sandboxes were succeeding but did not sync back to the cluster. - However, existing sandboxes continued to work as expected within customer clusters.
Some issues were identified in postmortem which caused delay in detection of the issue. These have now been addressed with more robust monitoring configuration and improvements to on-call playbooks for verifying alarm settings periodically.