API performance issues

Incident Report for Gainsight

Postmortem

An anomaly caused service failure for a portion of underlying API systems which allowed the affected nodes to report availability while failing to accept traffic. This rare and unfortunate situation caused additional load for nodes still accepting traffic, and auto-scaling did not trigger as designed to keep up with load.

Engineers were able to identify and correct the issue quickly while ensuring that resources were available to meet demand.

Adjustments to service monitors and auto-scaling configurations were made to prevent this scenario moving forward. We take availability and performance seriously and will continue to make improvements for the best possible experience.

Posted Mar 29, 2021 - 03:20 UTC

Resolved

Some customers may have experienced application slowness or instability between the times of 2:45 PM and 3:20 PM UTC due to elevated API response times.
We will provide details through Incident Postmortem as we determine the root cause.

Posted Mar 26, 2021 - 16:11 UTC

Monitoring

A fix has been implemented and we are monitoring the results.

Posted Mar 26, 2021 - 15:38 UTC

Investigating

We are investigating reports of degraded API performance.

Posted Mar 26, 2021 - 15:26 UTC

This incident affected: Gainsight CS - US1 Region (Gainsight CS US1 Application).