Device collection had been re-enabled shortly after the last update however we had noticed that one job queue shard was not functioning at 100%. We fixed that issue and all services normalized shortly thereafter. We continued to monitor US-EAST-01 cluster and are now resolving this incident.
Aug 15, 18:10 EDT
All database failures have finished. We are continuing to keep device data collection paused until we process more of the existing backlog of messages and data processing.
Aug 15, 15:09 EDT
Most database failovers are complete. We are keeping data collection from devices paused while we verify full service with other components.
Aug 15, 15:00 EDT
REST API service is returning due to removal of affected zones. We are continuing to work to database failovers.
Aug 15, 14:26 EDT
The issue is related to a loss of connectivity to almost a dozen databases in US-EAST-1. We are dropping the affected availability zone to restore service to the APIs. Data processing and message sending scheduled for the affected availability zone will be delayed until we complete a database failover in those regions. Data processing and message sending in other availability zones is working at a degraded pace.
Aug 15, 14:10 EDT
We are investigating issues connecting to our US dashboard in our US-EAST-1 region. We believe this is caused by an AWS issue affecting an availability zone. We are continuing to investigate.
Aug 15, 14:01 EDT