Outage #904
Information
Begins at: | 2022-06-02 10:45:00 CEST |
Duration: | 360 minutes |
Type: | outage |
State: | resolved |
Impact: | system_reset |
Affected systems: | Location Praha
Location Playground Location Praha Storage Location Staging |
Summary: | |
English: Power outage in MasterDC Praha | |
Česky: Výpadek napájení v MasterDC Praha | |
Description: | |
English: Both power lines went down in MasterDC Praha. We've been working on recovery as soon as power and network connectivity were renewed. This meant restarting all of our hardware. Right now, all nodes should be up and running. We apologies for not picking up the phone or writing sooner, all of our efforts went to recover our systems. |
|
Česky: V DC v Praze vypadly obě větve napájení. Od obnovy napájení a konektivity jsme pracovali na zprovoznění všech VPS. Aktuálně by měly běžet všechny nody a VPS. Omlouváme se pokud jsme nezvedli telefon nebo nenapsali dříve, naším cílem bylo co nejdříve zprovoznit všechny systému. |
|
Handled by: | Pavel Šnajdr, Tomáš Srnka, Jakub Skokan, Martin Myška |
Updates
Date | Summary | Reported by |
---|---|---|
2022-06-02 15:20:07 CEST | Jakub Skokan | |
State: announced | ||
2022-06-05 09:35:07 CEST | English: A few words regarding the outage
Česky: Pár slov k výpadku |
Jakub Skokan |
State: resolved
English: We have published a blog post with more information about the power outage on Thursday. It also contains a communication we've received from MasterDC. https://blog.vpsfree.cz/post-mortem-par-slov-k-vypadku/ Since the blog post is in Czech, I include a short translation in English. The outage was connected to power loss in Prague city, not just the datacenter. While the DC has power backups, they failed and the first power line went down. At this time, we've continued to run on the second power line. As the DC was reconnecting selected devices to the second power line, a short circuit has occurred and the second power line went down as well. Everything we have in Prague was thus shut down, including our email support and tools that we otherwise use to communicate with our members. We've immediately set out to the DC to sort out the issues on the spot. The outage was further complicated by slowly booting switches. Most of our nodes are booted from PXE and since the nodes were up faster than the network, they failed to boot. This led to an additional delay, as we've had to reset them again. We've also lost one 10G switch during the outage. We're working on making the PXE server available faster to avoid the boot issues in the future. We appreciate MasterDC's response, as they were open about the situation. Česky: Na našem blogu jsme uvedli více informací ke čtvrtečnímu výpadku napájení v Praze. Blog obsahuje také vyjádření, které jsme obdrželi od MasterDC: https://blog.vpsfree.cz/post-mortem-par-slov-k-vypadku/ |