Investigating Elevated Error Rates
Incident Report for Gainsight
Postmortem

Issue: Elevated Error Rates

Cause: Increase in traffic caused by a query retry mechanism that failed to close properly

Incident Timeline on the 14th of June:
15:40 UTC - Our Technical Operations Team started investigating alerts and an increase in database traffic.  The increased traffic was caused by a retry mechanism that failed to close properly after query results were provided.
16:07 UTC - By this time, the source queries were recognized and closed. As impact was realized we alerted customers of the issue.
16:29 UTC - With stability confirmed, we moved to the monitoring phase.
17:00 UTC - Incident closed after further confirmation.

Resolution:
Our approach to resilient architecture ensured services remained online but this incident helped us realize improvement opportunities. We have applied a fix which will help services better handle scenarios like this moving forward.

Posted Jun 28, 2022 - 03:29 UTC

Resolved
This incident has been resolved. We will share RCA details as they become available.
Posted Jun 14, 2022 - 17:00 UTC
Monitoring
The source of this issue has been identified and we have taken corrective action to resolve. Systems are stable and we are monitoring closely.
Posted Jun 14, 2022 - 16:29 UTC
Investigating
We are investigating an increase in error rates and latency. We will update when we have more information.
Posted Jun 14, 2022 - 16:07 UTC
This incident affected: Gainsight CS - US1 Region (Gainsight CS US1 Application).