Investigating - Application Availability
Incident Report for Gainsight
Postmortem

Issue: Application Availability Impact

Cause:
An infrastructure bug led to system instability leaving a small number of API nodes accepting traffic at the time.

Timeline on the 12th of November:
9:24 UTC - Synthetic monitoring alerts and customer reports signaled API issues.
9:31 UTC - Engineers confirmed the issue and acted quickly to bring supporting systems online.
9:43 UTC - The incident status was closed after observing a period of stability.

Resolution:
Engineers quickly started the recovery process by introducing new nodes to balance API load and retired the affected systems.
We have also introduced additional checks to avoid failures like this moving forward.
Please email support@gainsight.com if there are further questions.

Posted Nov 19, 2021 - 22:35 UTC

Resolved
This incident has been resolved.
We will follow up with RCA details as we receive them.
Posted Nov 12, 2021 - 09:43 UTC
Update
We have identified the issue and triggered recovery action. We are monitoring the systems closely.
Posted Nov 12, 2021 - 09:31 UTC
Investigating
We are investigating elevated error rates which may result in availability issues for some customers. More details to follow.
Posted Nov 12, 2021 - 09:24 UTC
This incident affected: Gainsight CS - US1 Region (Gainsight CS US1 Application).