We deeply apologize for the impact this service interruption caused. This analysis outlines the complete root cause and the sequence of cascading failures that compromised the facilities redundant power systems in the us-east4 regional extension. The findings confirm that the initial external utility event triggered a failure chain within the facility, resulting in the loss of critical services on the IP4G platform. This analysis details the preventative actions we are taking to reinforce resilience.
Impact Duration: 3 Hours 30 minutes
On Wednesday, 15 October 2025 a major utility power event caused the Google Cloud regional extension data center in us-east4 to transition to generator power. During automatic transfer of utility power to generator power, the transfer failed for several blocks of power. The multiple block failures caused a complete loss of redundant power for multiple customers in the data center facility, including IBM Power for Google Cloud.
Background and Incident Details
IBM Power for Google Cloud operates in Google Cloud regional extension data centers. These data centers are tier 3 or greater facilities with N+1 cooling and power at minimum. IBM Power for Google Cloud relies on the data center facility to manage utility, generator and UPS power. When we deploy each region, our engineers collaborate with the facility provider to ensure all systems are supplied by redundant power designated by the facility. All systems in IBM Power for Google Cloud are connected to redundant power sources with no single point of failure. In us-east4, each IBM Power for Google Cloud compute, storage, and network system is connected to two power sources supplied by the facility. Each power source is attached to an independent circuit that is served via an array of uninterruptible power supplies (UPS) in the event of utility power failure. The facility also maintains several generators that supply long term power in the event of a utility power failure. Generators and UPS units are grouped into blocks that can also provide additional redundancy to adjacent blocks in the event of failures within a block. The overall power system is robust, and engineered for critical workloads.
On October 15, 2025 utility power was lost and systems successfully transferred to multiple blocks of Uninterruptible Power Supplies (UPS) as expected. All facility generators received the start command and began running. However, two generators experienced faults that lead to a cascading sequence of failures and loss of power for the data center facility:
Initial Failure: Generator E tripped due to an improper breaker trip setting (a configuration error).
Secondary Failure: Simultaneously, Generator B tripped due to an internal breaker fault.
Total Discharge: With two generators unavailable, four independent UPS blocks exhausted their battery capacity and shut down.
Cascading Failure: The UPS load was automatically transferred to the remaining UPS block which became overloaded and tripped the output breaker, causing a total loss of critical facility power.
This series of failures, which bypassed the designed redundancy of the facility, ultimately resulted in a total loss of power to all IBM Power for Google Cloud systems and other customers within the affected data center blocks.
Timeline
07:27 AM EST - The data center facility provider’s UPS block supporting IBM Power for Google Cloud reaches end of battery discharge and shuts down
07:30 AM EST - IP4G engineering detects loss of access to critical systems and begins troubleshooting
08:00 AM EST - IP4G incident posted on the Statuspage to notify customers of incident
08:22 AM EST - IP4G engineering receives notification of critical power and generator failures in the provider facility
08:28 AM EST - Utility power and load restored to UPS block supporting IP4G
08:39 AM EST - IP4G engineering detects systems coming online and being services restoration process
09:08 AM EST - IP4G Block Storage systems online and healthy
09:51 AM EST - IP4G Network systems online and healthy
10:48 AM EST - IP4G Compute systems online and healthy
10:57 AM EST - IP4G notifies customers that compute workloads can be powered on
11:28 AM EST - IP4G notifies customers that the incident is resolved and transitions the status to monitoring
01:03 PM EST - Second utility power loss; customer loads transfer to generator power successfully.
02:11 PM EST - Utility power restored and stable for the final recovery
Root Cause
The confirmed Root Cause has been identified by our Facility Provider and is summarized below:
“Utility power failed the entire building, and was transferred to generators as expected. However, two generators experienced unexpected failures. The breaker for Generator E tripped to due improper breaker trip settings. The breaker for Generator B failed to remain closed the cycled multiple times due to a breaker fault, this prevented Generator B from supporting any load. All UPS units supported by Generator B and E ran on battery until the end of their discharge time and shutdown. The load was then automatically transferred to an alternate UPS in another power block, this simultaneous transfer caused the target UPS to become overloaded and trip the output breaker, ultimately causing a loss of all power.”
Prevention
The failures outlined above demand immediate and decisive action from both the facility provider and our internal IP4G engineering team. Below are the committed steps we are taking to address the root causes and reinforce the resilience of the platform:
Provider Facility Actions
IBM Power for Google Cloud Actions
This is the final version of the report