TL;DR
- What happened: A power outage during a week-long vacation rebooted the Debian server that fronts my Talos cluster. The encrypted root halted at the LUKS prompt and stayed there, effectively cutting off my remote access to my homelab’s private network.
- Impact: Most services ran normally, but couldn’t be accessed by me.
- Root cause: The Debian server is the single Tailscale path into the LAN-isolated cluster, its root is LUKS-encrypted with no auto-unlock, and there was no UPS to ride out the flicker.
- Lesson: “Worst case I drive home and type a password” stops being a tolerable answer the moment I am not in driving distance.
Impact
Most services on the Talos cluster kept running normally for the whole Charleston trip. The cluster recovered from the outage on its own when power came back, and the workloads inside it kept doing their jobs. The problem was that the only external path in runs through the Debian server, which was sitting at a LUKS prompt with nobody home. None of those still-running services were reachable from a thousand miles away. No customer impact, just my own access. The cost was a vacation’s worth of hobby-project hours I had specifically planned to spend on the cluster.
Root cause
The Talos cluster lives on an isolated 192.168.2.0/24 wired LAN with no direct internet egress. The Debian server doubles as the NAT gateway out to WiFi and as the Tailscale subnet router advertising 192.168.2.0/24 onto the tailnet. That is the only external path to the cluster.
Tailscale used to run as a system extension on the Talos nodes themselves, but it was causing the ext-tailscale service to stall on every boot, so I removed it in a rolling image upgrade and pushed the Tailscale responsibility onto Debian. The Tailscale Kubernetes Operator still runs in-cluster for egress proxies, but its API server proxy mode is deliberately disabled to avoid a known lockout risk if that proxy fails. The cumulative effect is that every external path into the cluster funnels through one machine.
That machine has a LUKS-encrypted root (see the inaugural LUKS postmortem), so every reboot needs a human in the room. No UPS sat behind any of the four machines, so a brief flicker was enough to drop everything to the LUKS prompt.
Fixes
- Immediate: Got home, typed the LUKS passphrase, confirmed the Debian server, Tailscale subnet router, and cluster were back up.
- Structural: Ordered an APC Back-UPS 600 once I got home. It arrived a few days later and then sat in my office unopened. A second outage was what finally got me to plug it in. Five battery-and-surge outlets. All three Talos nodes and the Debian server are on battery+surge. The fifth outlet currently holds an external HDD array; when a fourth cluster node lands (likely a GPU box to move local LLM inference off the MacBook), the array drops to surge-only and the new node takes the battery slot.
Open questions
- The UPS USB cable is plugged into the Debian server but graceful shutdown is not configured. Plan is
apcupsdpending confirmation of what the Back-UPS 600 actually speaks. Goal: clean shutdown before the battery dies, not a crash. - Open whether to fan that shutdown signal out to the Talos nodes via
talosctl shutdown. - Research remote LUKS-unlock options like dropbear-in-initramfs, which would let me SSH in over Tailscale and type the passphrase from anywhere instead of needing to be physically at the keyboard.
Lessons
I’m almost always at home, and power outages are pretty rare. I think I may have had to pull out a keyboard and monitor twice in the six months prior, and this issue kept getting pushed off for higher priority work, or quite frankly more interesting work. I even thought about this before I went on vacation, but decided to roll the dice. As soon as I was unable to access my homelab, I knew exactly what had happened and knew that I had brought this upon myself. The good news is this isn’t critical infrastructure to anyone but me, so no customer support tickets were logged.
Links
- Inaugural LUKS postmortem: /homelab/postmortem-debian-luks-reboot
- Homelab overview: /homelab