~/homelab > whoami
I run a small production-grade homelab: a three-node Talos Kubernetes cluster on bare-metal Lenovo M700s, fronted by a Debian server that handles NAT, Tailscale subnet routing, and a few services that do not belong in Kubernetes. That server also runs the NVR stack (go2rtc, Frigate, and a Coral USB TPU for on-device object detection), fed by PoE cameras on a separate wired network isolated from the house LAN.
The cluster hosts Git, task tracking, a self-hosted social scheduler, a workflow engine, an object store, and a personal agent stack. Small enough that I understand every layer. Large enough that real failures happen, which is the point.
~/homelab > topology
niard.cloud]:::external Tailnet([Tailscale Tailnet]):::external Cameras[("PoE Cameras")]:::external subgraph DebianHost["Debian Server (Lenovo M700)"] direction TB DebianEdge["NAT • Subnet Router"]:::edge subgraph NVR["NVR Stack"] direction LR Go2rtc["go2rtc
RTSP proxy"]:::app Frigate["Frigate"]:::app Coral[("Coral USB TPU
object detection")]:::platform end Go2rtc --> Frigate Frigate -.-> Coral end subgraph Cluster["Talos Cluster (3x Lenovo M700)"] direction TB GW["Cilium Gateway API
L2 LoadBalancer Pool"]:::platform subgraph Platform["Platform Layer"] direction LR Longhorn[(Longhorn
Storage)]:::platform Cert[cert-manager]:::platform ExtDNS[ExternalDNS]:::platform Prom[Prometheus
+ Grafana]:::platform end subgraph Public["Public Services"] direction LR Forgejo[Forgejo]:::app Vikunja[Vikunja]:::app Postiz[Postiz Stack]:::app Windmill[Windmill]:::app MinIO[MinIO]:::app end subgraph VClusters["vclusters"] direction LR subgraph Agent["Agent Stack"] Hermes[Hermes Agent]:::agent Synapse[Synapse Matrix]:::agent end subgraph Bench["Bench Sandbox"] Evals[Eval Rigs]:::sandbox Spare[Spare Rigs]:::sandbox end end GW --> Public GW --> Agent Public -.-> Longhorn Agent -.-> Longhorn Bench -.-> Longhorn end Internet --> CF CF --> GW Internet -.->|Wireguard| Tailnet Tailnet --> DebianEdge DebianEdge --> Cluster Cameras --> Go2rtc classDef external stroke:#64748b classDef edge stroke:#fbbf24 classDef platform stroke:#4ade80 classDef app stroke:#60a5fa classDef agent stroke:#a78bfa classDef sandbox stroke:#94a3b8 classDef vclusterGroup stroke:#a78bfa,stroke-dasharray:4 4 class VClusters vclusterGroup
~/homelab > vclusters
The two vclusters in the topology above are what happens to be running today, not a fixed part of the stack. vcluster lets me spin a fresh control plane on the host cluster in a few minutes whenever I need one: a benchmark rig, an upgrade rehearsal, a sketchy experiment that deserves its own blast radius. Each one gets its own API server and CRDs without sharing state with the rest of the cluster. When the work is done I delete the vcluster and the host is left clean.
~/homelab > inference
I don’t have gobs of money to throw at frontier labs, and I also don’t trust any of them. I try to keep anything sensitive or novel that I don’t want to share with the world locally. Inference runs on my MacBook Pro M1 with 32GB of unified memory, fronted by llama.cpp and MLX. Local sweet spots: gemma3-12b for general reasoning, qwen3-14b for code-shaped tasks, parakeet-mlx for ASR.
LiteLLM sits in front of every backend as a single OpenAI-compatible gateway. Agents in the cluster reach it over Tailscale and ask for a model by name; LiteLLM decides whether the request stays on the M1 or fans out to a cloud provider. That decoupling means I can swap a backend, retune routing, or pin a per-model budget without touching any agent code. When a request needs a frontier model the M1 cannot hold (large context, kimi-k2.5-class capability, heavy tool-use loops), LiteLLM falls back to NVIDIA NIM or OpenCode Go. Both are gated by a dollar budget at the router, so a runaway loop cannot drain them.
Talos + vclusters"]:::agent subgraph M1["MacBook Pro M1 / 32GB"] direction TB LiteLLM["LiteLLM Router"]:::edge Backends["llama.cpp · MLX"]:::platform Models["gemma3-12b · qwen3-14b · parakeet-mlx"]:::app LiteLLM --> Backends Backends --> Models end subgraph Cloud["Cloud Fallback"] direction TB NIM["NVIDIA NIM"]:::external OC["OpenCode Go"]:::external end Agents -.->|Tailscale| LiteLLM LiteLLM -->|frontier models| Cloud classDef external stroke:#64748b classDef edge stroke:#fbbf24 classDef platform stroke:#4ade80 classDef app stroke:#60a5fa classDef agent stroke:#a78bfa
~/homelab > status
Probed Jun 22 16:10 UTC by an in-cluster CronJob. This is a static build, so the grid is point-in-time: it reflects whatever snapshot was current the last time the site was rebuilt.
Uptime is averaged over a 30-day rolling window. Probes started Jun 22, 2026, so current figures reflect only the data collected since then.
~/homelab > metrics
Snapshot Jun 22 16:10 UTC from in-cluster Prometheus. Static build, so this is point-in-time: it reflects whatever snapshot was current the last time the site was rebuilt.
~/homelab > field-notes
What I Run, and Why
The reasoning behind the choices in my homelab: why Talos, why Debian as the gateway, why local inference, and where convenience and control trade off.
A Power Outage Cut My Access To My Homelab During Vacation
A power outage during a week-long Charleston vacation took down the Debian server that fronts my Talos cluster. With LUKS at boot and no UPS to ride out the flicker, my remote access to the homelab stayed cut for the whole trip. I came home, ordered an APC Back-UPS 600, and then put off plugging it in. A second outage a few days later was what finally got it installed.
Debian LUKS Locked Me Out of My Cluster for 14 Hours
An overnight reboot on the LUKS-encrypted Debian server that fronts my Talos cluster halted at the passphrase prompt. The cluster and every hosted service were unreachable until I drove to the box and typed it in.