Delays in Journey Orchestrator
Incident Report for Gainsight US
Postmortem

Gainsight application users experienced Delays in Journey Orchestrator from Sep 08, 2022 - 22:16 UTC to Sep 09, 2022 - 10:24 UTC.  During this window, Journey Orchestrator was either delayed or unavailable for end users.

Root Cause :

Investigations so far have indicated the incident was the result of following multiple factors

 

  • A production change was performed on 7th September to fix a bug  where email participants from the program which contains inline reports in the email step were getting dropped
  • This fix along with an edge-case scenario where a sudden burst of 800K participant email published by a customer consumed a huge amount of storage within a very short span of time on one of the backend Datastores.
  • While the autoscaling mechanism increased the storage capacity immediately that was also consumed and further storage addition was blocked due to the Optimization step for previously added storage.

 

 

Recovery Action :

  • Functionality was blocked to prevent any impact or data loss.
  • Initiated recovery steps to restore the Datastore without data loss. 
  • Functionality was restored as soon as Datastore was recovered.

Preventive Measures:

  • Review and fix the edge-case scenario. Review of change management procedures.
  • Review storage Autoscaling limits to prevent in the future.
Posted Sep 14, 2022 - 16:53 UTC

Resolved
A fix is applied to resolve the issue and we are monitoring it closely. We anticipate a delay in the processing, which should catch up in a few hours.
Posted Sep 09, 2022 - 10:24 UTC
Update
We are continuing to work on a fix for this issue.
Posted Sep 09, 2022 - 10:24 UTC
Update
We are continuing to work on a fix for this issue.
Posted Sep 09, 2022 - 10:21 UTC
Update
Thank you for your patience as we continue to work on a fix for this issue. Anticipate sending delays of up to 13 hours for Journey Orchestrator
Posted Sep 09, 2022 - 08:19 UTC
Update
Please anticipate sending delays of up to 10 hours for Journey Orchestrator while we continue to resolve this issue. All systems outside of Journey Orchestrator are functioning as expected.
Posted Sep 09, 2022 - 03:00 UTC
Update
Thank you for your patience as we continue to work on a fix for this issue.
Posted Sep 09, 2022 - 02:18 UTC
Update
Thank you for your patience as we continue to work on a fix for this issue.
Posted Sep 09, 2022 - 01:18 UTC
Update
Thank you for your patience as we continue to work on a fix.
Posted Sep 09, 2022 - 00:18 UTC
Identified
The issue has been identified and we are working on a fix.
Posted Sep 08, 2022 - 23:09 UTC
Investigating
We are investigating an issue with the Journey Orchestrator service.

We have just placed Journey Orchestrator in a hold state as we work on identifying the issue.
Posted Sep 08, 2022 - 22:16 UTC
This incident affected: Gainsight Application.