Homelab topology illustration

~/homelab > whoami

I run a small production-grade homelab: a three-node Talos Kubernetes cluster on bare-metal Lenovo M700s, fronted by a Debian server that handles NAT, Tailscale subnet routing, and a few services that do not belong in Kubernetes. That server also runs the NVR stack (go2rtc, Frigate, and a Coral USB TPU for on-device object detection), fed by PoE cameras on a separate wired network isolated from the house LAN.

The cluster hosts Git, task tracking, a self-hosted social scheduler, a workflow engine, an object store, and a personal agent stack. Small enough that I understand every layer. Large enough that real failures happen, which is the point.

What I run, and why →

~/homelab > topology

flowchart TB Internet([Internet]):::external CF[Cloudflare DNS
niard.cloud]:::external Tailnet([Tailscale Tailnet]):::external Cameras[("PoE Cameras")]:::external subgraph DebianHost["Debian Server (Lenovo M700)"] direction TB DebianEdge["NAT • Subnet Router"]:::edge subgraph NVR["NVR Stack"] direction LR Go2rtc["go2rtc
RTSP proxy"]:::app Frigate["Frigate"]:::app Coral[("Coral USB TPU
object detection")]:::platform end Go2rtc --> Frigate Frigate -.-> Coral end subgraph Cluster["Talos Cluster (3x Lenovo M700)"] direction TB GW["Cilium Gateway API
L2 LoadBalancer Pool"]:::platform subgraph Platform["Platform Layer"] direction LR Longhorn[(Longhorn
Storage)]:::platform Cert[cert-manager]:::platform ExtDNS[ExternalDNS]:::platform Prom[Prometheus
+ Grafana]:::platform end subgraph Public["Public Services"] direction LR Forgejo[Forgejo]:::app Vikunja[Vikunja]:::app Postiz[Postiz Stack]:::app Windmill[Windmill]:::app MinIO[MinIO]:::app end subgraph VClusters["vclusters"] direction LR subgraph Agent["Agent Stack"] Hermes[Hermes Agent]:::agent Synapse[Synapse Matrix]:::agent end subgraph Bench["Bench Sandbox"] Evals[Eval Rigs]:::sandbox Spare[Spare Rigs]:::sandbox end end GW --> Public GW --> Agent Public -.-> Longhorn Agent -.-> Longhorn Bench -.-> Longhorn end Internet --> CF CF --> GW Internet -.->|Wireguard| Tailnet Tailnet --> DebianEdge DebianEdge --> Cluster Cameras --> Go2rtc classDef external stroke:#64748b classDef edge stroke:#fbbf24 classDef platform stroke:#4ade80 classDef app stroke:#60a5fa classDef agent stroke:#a78bfa classDef sandbox stroke:#94a3b8 classDef vclusterGroup stroke:#a78bfa,stroke-dasharray:4 4 class VClusters vclusterGroup

~/homelab > vclusters

The two vclusters in the topology above are what happens to be running today, not a fixed part of the stack. vcluster lets me spin a fresh control plane on the host cluster in a few minutes whenever I need one: a benchmark rig, an upgrade rehearsal, a sketchy experiment that deserves its own blast radius. Each one gets its own API server and CRDs without sharing state with the rest of the cluster. When the work is done I delete the vcluster and the host is left clean.

~/homelab > inference

I don’t have gobs of money to throw at frontier labs, and I also don’t trust any of them. I try to keep anything sensitive or novel that I don’t want to share with the world locally. Inference runs on my MacBook Pro M1 with 32GB of unified memory, fronted by llama.cpp and MLX. Local sweet spots: gemma3-12b for general reasoning, qwen3-14b for code-shaped tasks, parakeet-mlx for ASR.

LiteLLM sits in front of every backend as a single OpenAI-compatible gateway. Agents in the cluster reach it over Tailscale and ask for a model by name; LiteLLM decides whether the request stays on the M1 or fans out to a cloud provider. That decoupling means I can swap a backend, retune routing, or pin a per-model budget without touching any agent code. When a request needs a frontier model the M1 cannot hold (large context, kimi-k2.5-class capability, heavy tool-use loops), LiteLLM falls back to NVIDIA NIM or OpenCode Go. Both are gated by a dollar budget at the router, so a runaway loop cannot drain them.

flowchart LR Agents["Homelab Agents
Talos + vclusters"]:::agent subgraph M1["MacBook Pro M1 / 32GB"] direction TB LiteLLM["LiteLLM Router"]:::edge Backends["llama.cpp · MLX"]:::platform Models["gemma3-12b · qwen3-14b · parakeet-mlx"]:::app LiteLLM --> Backends Backends --> Models end subgraph Cloud["Cloud Fallback"] direction TB NIM["NVIDIA NIM"]:::external OC["OpenCode Go"]:::external end Agents -.->|Tailscale| LiteLLM LiteLLM -->|frontier models| Cloud classDef external stroke:#64748b classDef edge stroke:#fbbf24 classDef platform stroke:#4ade80 classDef app stroke:#60a5fa classDef agent stroke:#a78bfa

~/homelab > status

Forgejo 100.00%
Git server
Vikunja 100.00%
Task tracker
Postiz 100.00%
Social scheduler
Windmill 100.00%
Workflow engine
MinIO 100.00%
Object store
Grafana 100.00%
Metrics frontend
Talos API 100.00%
Cluster control plane
Debian Gateway 100.00%
NAT + subnet router

Probed Jun 22 16:10 UTC by an in-cluster CronJob. This is a static build, so the grid is point-in-time: it reflects whatever snapshot was current the last time the site was rebuilt.

Uptime is averaged over a 30-day rolling window. Probes started Jun 22, 2026, so current figures reflect only the data collected since then.

~/homelab > metrics

Cluster CPU (24h)
25.5% min 25.1 · max 25.8
Node memory used
talos-cp-01 14%
talos-worker-02 46%
talos-worker-01 45%
Pods running
89

Snapshot Jun 22 16:10 UTC from in-cluster Prometheus. Static build, so this is point-in-time: it reflects whatever snapshot was current the last time the site was rebuilt.

~/homelab > field-notes