Cluster connections are degraded

Incident Report for Signadot

Resolved

This incident has been mitigated. The incident report is as below:

During this timeframe:
- All calls to create / delete sandboxes, routegroups, jobs and JRGs were failing.
- Updates to sandboxes were succeeding but did not sync back to the cluster.
- However, existing sandboxes continued to work as expected within customer clusters.

Some issues were identified in postmortem which caused delay in detection of the issue. These have now been addressed with more robust monitoring configuration and improvements to on-call playbooks for verifying alarm settings periodically.

The root cause was identified as a deadlock in cluster sync causing connections to hang. This was mitigated soon after in https://www.signadot.com/docs/release-notes#2024-08-22.
Posted Aug 16, 2024 - 14:44 UTC

Investigating

We are currently investigating this issue.
Posted Aug 16, 2024 - 08:00 UTC
This incident affected: API & Control Plane (api.signadot.com, Cluster Control Plane).