Connectivity to US cluster seeming due to AWS incident
Incident Report for Braze, Inc.
Resolved
Device collection had been re-enabled shortly after the last update however we had noticed that one job queue shard was not functioning at 100%. We fixed that issue and all services normalized shortly thereafter. We continued to monitor US-EAST-01 cluster and are now resolving this incident.
Posted Aug 15, 2017 - 18:10 EDT
Monitoring
All database failures have finished. We are continuing to keep device data collection paused until we process more of the existing backlog of messages and data processing.
Posted Aug 15, 2017 - 15:09 EDT
Update
Most database failovers are complete. We are keeping data collection from devices paused while we verify full service with other components.
Posted Aug 15, 2017 - 15:00 EDT
Update
REST API service is returning due to removal of affected zones. We are continuing to work to database failovers.
Posted Aug 15, 2017 - 14:26 EDT
Identified
The issue is related to a loss of connectivity to almost a dozen databases in US-EAST-1. We are dropping the affected availability zone to restore service to the APIs. Data processing and message sending scheduled for the affected availability zone will be delayed until we complete a database failover in those regions. Data processing and message sending in other availability zones is working at a degraded pace.
Posted Aug 15, 2017 - 14:10 EDT
Investigating
We are investigating issues connecting to our US dashboard in our US-EAST-1 region. We believe this is caused by an AWS issue affecting an availability zone. We are continuing to investigate.
Posted Aug 15, 2017 - 14:01 EDT