Outage #904

Information

Begins at: 2022-06-02 10:45:00 CEST
Duration: 360 minutes
Type: outage
State: resolved
Impact: system_reset
Affected systems: Location Praha
Location Playground
Location Praha Storage
Location Staging
Summary:
English: Power outage in MasterDC Praha
Česky: Výpadek napájení v MasterDC Praha
Description:
English: Both power lines went down in MasterDC Praha. We've been working on recovery as soon as power and network connectivity were renewed. This meant restarting all of our hardware. Right now, all nodes should be up and running.

We apologies for not picking up the phone or writing sooner, all of our efforts went to recover our systems.
Česky: V DC v Praze vypadly obě větve napájení. Od obnovy napájení a konektivity jsme pracovali na zprovoznění všech VPS. Aktuálně by měly běžet všechny nody a VPS.

Omlouváme se pokud jsme nezvedli telefon nebo nenapsali dříve, naším cílem bylo co nejdříve zprovoznit všechny systému.
Handled by: Pavel Šnajdr, Tomáš Srnka, Jakub Skokan, Martin Myška

Updates

Date Summary Reported by
2022-06-02 15:20:07 CEST Jakub Skokan
State: announced
2022-06-05 09:35:07 CEST English: A few words regarding the outage
Česky: Pár slov k výpadku
Jakub Skokan
State: resolved

English: We have published a blog post with more information about the power outage
on Thursday. It also contains a communication we've received from MasterDC.

https://blog.vpsfree.cz/post-mortem-par-slov-k-vypadku/

Since the blog post is in Czech, I include a short translation in English.

The outage was connected to power loss in Prague city, not just the datacenter.
While the DC has power backups, they failed and the first power line went down.
At this time, we've continued to run on the second power line. As the DC was
reconnecting selected devices to the second power line, a short circuit has
occurred and the second power line went down as well.

Everything we have in Prague was thus shut down, including our email support
and tools that we otherwise use to communicate with our members. We've immediately
set out to the DC to sort out the issues on the spot. The outage was further
complicated by slowly booting switches. Most of our nodes are booted from PXE
and since the nodes were up faster than the network, they failed to boot. This
led to an additional delay, as we've had to reset them again. We've also lost
one 10G switch during the outage.

We're working on making the PXE server available faster to avoid the boot issues
in the future. We appreciate MasterDC's response, as they were open about the
situation.


Česky: Na našem blogu jsme uvedli více informací ke čtvrtečnímu výpadku napájení
v Praze. Blog obsahuje také vyjádření, které jsme obdrželi od MasterDC:

https://blog.vpsfree.cz/post-mortem-par-slov-k-vypadku/

Help

Where to report bugs and suggestions?

Support vpsFree.cz

Support mail: podpora@vpsfree.cz

Links

Status
https://status.vpsf.cz

IRC
irc.libera.chat #vpsfree

Matrix
#vpsfree:matrix.org

Discourse
https://discourse.vpsfree.cz

Knowledge base
Česky: https://kb.vpsfree.cz/
English: https://kb.vpsfree.org/

Sysadmins contacts

Jakub Skokan
IRC: aither at #vpsfree
Phone: +420 775 386 453

Pavel Snajdr (main admin)
IRC: snajpa at #vpsfree
Phone: +420 720 107 791