back to homelab

What I Run, and Why

~/homelab > why-a-homelab

I run a homelab to do systems work that cannot be learned from cloud consoles: provisioning nodes from fresh OS images, recovering from a Tailscale upgrade that broke, deciding when a service belongs in Kubernetes and when it does not. It is also where I run the agent-and-inference stack I do not want to hand to a frontier lab.

Last updated: 2026-06-21

~/homelab > talos-vs-the-alternatives

Talos for the cluster, not k3s or vanilla Kubernetes. The deciding factor was the lack of a general-purpose Linux underneath. Talos exposes one API and one set of config files; there is no shell, no package manager, no per-node drift that I could introduce by sshing in to “just fix this one thing.” For a single-person homelab where the operator is the same person being lazy at 11pm, that constraint is a feature. The cost is a steeper learning curve and a smaller community than k3s. I think both are fine choices; this is the one that matches my temperament.

~/homelab > debian-as-the-gateway

The Debian server in front of the cluster does NAT, Tailscale subnet routing, the NVR stack, and a few services that do not belong in Kubernetes. Originally Tailscale ran as a system extension on the Talos nodes themselves, but the ext-tailscale service stalled on every boot. I pushed the Tailscale responsibility onto Debian in a rolling image upgrade, and the boot reliability problem went away.

The cost of that move is now visible: Debian is a single point of failure for the cluster’s reachability. Two postmortems on this page document exactly what that means in practice. I still think Debian as the gateway is the right answer at this scale, paired with a UPS, runbooks for the LUKS unlock, and eventually a network-bound unlock mechanism. A redundant second gateway would defeat the “small enough that I understand every layer” principle.

~/homelab > local-inference

Local inference runs on a MacBook Pro M1 with 32GB of unified memory, fronted by llama.cpp and MLX. Two reasons. I do not want to send anything novel or sensitive to a frontier lab, and I do not want the agent stack to be priced per token.

LiteLLM sits between the cluster’s agents and the M1. Agents request a model by name; the router decides whether the request stays local or falls out to a cloud provider. The local cohort is gemma3-12b for general work, qwen3-14b for code, and parakeet-mlx for ASR. Frontier escapes go to NVIDIA NIM and OpenCode Go, both gated by a dollar budget at the router so a runaway loop cannot drain them.

I benchmark this routinely. The gap between the best local model and the bottom of the working frontier tier on structured-generation tasks is real but smaller than common framing suggests. The bigger gap is reliability in the tail.

~/homelab > isolated-lan

The cluster lives on an isolated 192.168.2.0/24 wired LAN with no direct internet egress. The Debian server is the only path in or out. The reason is blast-radius containment: anything I run in the cluster cannot accidentally reach the house network or the camera VLAN, because they are not addressable from there. The cost is that any workload needing internet egress has to go through Debian’s NAT, which is a deliberate friction.

~/homelab > storage

Longhorn for storage. The alternatives were Ceph (more capability, more operational weight, more memory per node than I have on 16GB workers) and hostpath / local-path (no replication; fine for sandboxing, not for the services I want to keep around). Longhorn replicates volumes across nodes, exposes a usable web UI, and recovers from a node loss without manual surgery. For the scale I run at, it has been the right tradeoff.

~/homelab > vclusters

vclusters are not a fixed part of the stack. I spin them up when I want a fresh control plane with its own API server and CRDs: a benchmark rig, an upgrade rehearsal, an experiment that deserves its own blast radius. When the work is done I delete the vcluster and the host cluster is left clean. The two vclusters in the topology diagram are what happens to be running when this page was last updated.

~/homelab > what-i-would-change

If I were starting this homelab today, I would still run Talos and still front it with a Debian gateway. I would put the UPS on the rack before the first outage instead of after the second one. I would also think harder about the LUKS unlock path before committing to full-disk encryption on the gateway, instead of discovering its cost the way I did. Both of those are documented in the field notes.