Rules Engine Processing Delays for few customers

Incident Report for Gainsight

Resolved

Root Cause:
A Transient Network fault resulted in a few of the running processes getting into a hung state. While the resiliency ensured the subsequent jobs to succeed, a few hung jobs remained in the queue for a longer time.

Recovery Action :
a) Restarting of services
b) Manual intervention to retrigger the hung jobs
c) Timeline entries are populated

Preventive action:
While the applications have a high level of resiliency in place to handle network issues, we are working on further enhancements to catch even remote cases.

We apologize for the inconvenience caused. Please reach out to support@gainsight.com if you have questions.

----
Status:
All queues are back to expected levels. Email sent between 3:30 AM and 3:30 PM UTC may not currently show in Timeline for some customers. No data was lost but we will be deploying a fix to re-synchronize timeline entries.

RCA:
We are still investigating Root Cause and will update this incident once the Timeline entries are populated and analysis is complete.

Posted Sep 01, 2020 - 18:35 UTC

Update

We have increased the capacity to process the backlog of rules and continue to investigate the cause of delay.

Posted Sep 01, 2020 - 11:37 UTC

Investigating

We are investigating reports of processing delays in the Rules Engine queue which could be effecting a subset of customers.

We will provide more information soon.

Posted Sep 01, 2020 - 10:39 UTC

This incident affected: Gainsight CS - US1 Region (US1 Rules Engine Queue).