RC: (upload) astro initial structure

2026-04-01 00:19:49 +01:00
commit 8c11192e7b
29 changed files with 8561 additions and 0 deletions
--- a/src/content/blog/day-001-why-i-built-a-homelab.md
+++ b/src/content/blog/day-001-why-i-built-a-homelab.md
@@ -0,0 +1,63 @@
+---
+title: "Day 1 — Why I Built a Homelab (and Why You Probably Should Too)"
+description: "Where it all starts. The hardware decisions, the philosophy, and the dangerous moment I decided a Proxmox cluster was a \"good idea\"."
+pubDate: 2024-11-01
+day: 1
+tags: ["homelab", "proxmox", "hardware", "why"]
+---
+
+## The Itch
+
+Every sysadmin eventually gets the itch. You spend your days wrangling production HPC clusters — ARCHER2 nodes, Cirrus compute, InfiniBand fabric — and you come home and think: *I want something like this, but mine*.
+
+Not a VM on a cloud provider. Not a Raspberry Pi running Pi-hole. A proper stack — compute, storage, networking — that I control end to end, where breaking things at 2am is a learning experience rather than a P1 incident.
+
+This is Day 1 of that journey.
+
+## The Philosophy First
+
+Before buying a single piece of hardware, I wrote down what I actually wanted from a homelab:
+
+- **Failure should be cheap, not catastrophic.** Real redundancy, snapshots, backups.
+- **Everything as code.** If I can't reproduce the setup from a Git repo, it doesn't exist.
+- **Dogfood the tools I use at work.** Ansible, Terraform, monitoring stacks — use them at home so they're muscle memory at EPCC.
+- **No cloud vendor lock-in.** Self-hosted or bust.
+
+## Hardware — What I Picked and Why
+
+### Compute
+
+I went with a used server chassis from eBay rather than consumer NUCs. The reasoning was simple: ECC RAM, IPMI, and PCIe slots worth having. The power draw penalty is real, but I'm not running this 24/7 at full load.
+
+> The most expensive homelab decision is the one you make twice. Buy the thing that gives you headroom.
+
+### Storage
+
+ZFS from day one. I'd seen enough production storage corruption at work to know that data integrity checksums aren't optional. I spec'd the pool conservatively — RAIDZ2 across four drives — knowing I could expand later when ZFS 2.2's RAIDZ expansion landed properly.
+
+Pool name: `zfs-oporto`. Obviously.
+
+### Networking
+
+OPNsense on a dedicated box. No consumer router running some vendor's locked-down firmware between me and the internet. VLANs from the start, even when it felt like overkill — because it always feels like overkill until it isn't.
+
+## The Software Stack (Day 1 Vision)
+
+```
+[ OPNsense ]  ← router/firewall/VLANs
+    |
+[ Proxmox VE ] ← hypervisor
+    ├── Kubernetes cluster (k3s initially, then full k8s)
+    ├── NAS VM → ZFS pool
+    └── Utility VMs (monitoring, DNS, etc.)
+```
+
+The monitoring story was left deliberately vague on Day 1. Spoiler: it took many iterations and a lot of wrong turns before landing on Netdata → VictoriaMetrics → Grafana.
+
+## What Actually Happened
+
+I had everything racked and powered by midnight. The first Proxmox install took 20 minutes. The first networking misconfiguration locked me out of the management interface for two hours.
+
+Standard.
+
+See you on [Day 2 →](/blog/day-002-proxmox-cluster-and-first-vms) where we get Proxmox clustered and first VMs running.
--- a/src/content/blog/day-002-proxmox-cluster-and-first-vms.md
+++ b/src/content/blog/day-002-proxmox-cluster-and-first-vms.md
@@ -0,0 +1,39 @@
+---
+title: "Day 2 — Proxmox Cluster, VLANs, and the First Real Mistake"
+description: "Getting Proxmox VE into a proper cluster, carving VLANs in OPNsense, and the routing loop that ate an hour of my evening."
+pubDate: 2024-11-08
+day: 2
+tags: ["proxmox", "networking", "opnsense", "vlans"]
+---
+
+## Proxmox Cluster
+
+With two nodes up, it was time to cluster them. Proxmox's cluster setup is deceptively straightforward in the happy path. The catch is Corosync — it's very opinionated about network latency and quorum, and if you misconfigure which interface carries cluster traffic, you will have a bad time.
+
+```bash
+# On node 1:
+pvecm create homelab-cluster
+
+# On node 2:
+pvecm add <node1-ip>
+```
+
+## VLAN Design
+
+I settled on a simple scheme early and stuck to it:
+
+| VLAN | Purpose                  | Subnet          |
+|------|--------------------------|-----------------|
+| 10   | Management (IPMI, etc.)  | 10.0.10.0/24    |
+| 20   | Proxmox hosts            | 10.0.20.0/24    |
+| 30   | Kubernetes pods          | 10.0.30.0/24    |
+| 40   | Media / untrusted VMs    | 10.0.40.0/24    |
+| 50   | IoT                      | 10.0.50.0/24    |
+
+The management VLAN is firewalled hard — nothing from VLAN 40 or 50 touches it.
+
+## The Mistake
+
+I fat-fingered a firewall rule in OPNsense that accidentally allowed VLAN 40 to reach the Proxmox management interface. I only noticed because I was testing my GeoIP block rules and a route showed up that shouldn't have existed.
+
+Lesson: always add a deny-all rule at the bottom of every VLAN's outbound chain, even when it feels redundant. Explicit beats implicit.
--- a/src/content/blog/day-003-kubernetes-flux-gitops.md
+++ b/src/content/blog/day-003-kubernetes-flux-gitops.md
@@ -0,0 +1,63 @@
+---
+title: "Day 3 — Kubernetes on Bare Metal, Flux GitOps, and Why I Stopped Using k3s"
+description: "Graduating from k3s to full kubeadm, setting up Flux CD for GitOps, and the first taste of what Longhorn storage actually means."
+pubDate: 2024-11-15
+day: 3
+tags: ["kubernetes", "flux", "gitops", "longhorn", "k8s"]
+---
+
+## Why Not k3s Forever?
+
+k3s is excellent. I used it for two months and it worked fine. I moved to full Kubernetes (kubeadm) for one reason: I wanted the experience to transfer directly to production environments. At work we don't run k3s. The extra complexity of kubeadm is the point.
+
+## The Install
+
+```bash
+# kubeadm init on control plane
+sudo kubeadm init \
+  --pod-network-cidr=10.244.0.0/16 \
+  --control-plane-endpoint="k8s-control.int.h0melab.uk"
+
+# CNI — went with Cilium over Flannel for eBPF goodness
+helm install cilium cilium/cilium --namespace kube-system
+```
+
+## Flux GitOps
+
+This was the decision that changed everything. Instead of `kubectl apply`-ing manifests, every change goes through Git:
+
+```
+homelab-k8s/
+├── clusters/homelab/
+│   ├── flux-system/      ← Flux's own manifests
+│   ├── infrastructure/   ← Traefik, Longhorn, cert-manager
+│   └── apps/             ← Actual workloads
+```
+
+The golden rule I established here: **Chart.yaml version bumps are required for Flux to pick up Helm chart changes.** Forgot this approximately 15 times before it became instinct.
+
+## Longhorn
+
+Distributed block storage across three worker nodes. The UI is surprisingly good. The first time I watched a volume replica heal itself after a node reboot, I understood why people write blog posts about storage.
+
+```yaml
+# The PVC pattern I use for everything
+apiVersion: v1
+kind: PersistentVolumeClaim
+metadata:
+  name: app-data
+  annotations:
+    helm.sh/resource-policy: keep   # ← never delete this on helm uninstall
+spec:
+  storageClassName: longhorn
+  accessModes: [ReadWriteOnce]
+  resources:
+    requests:
+      storage: 10Gi
+```
+
+The `helm.sh/resource-policy: keep` annotation saved my data at least twice when I was iterating on Helm releases.
+
+## What's Next
+
+Day 4 covers the monitoring stack — Netdata agents, VictoriaMetrics as the TSDB, and getting Grafana to look like something I'd actually want to stare at during an incident.
--- a/src/content/config.ts
+++ b/src/content/config.ts
@@ -0,0 +1,15 @@
+import { defineCollection, z } from 'astro:content';
+
+const blog = defineCollection({
+  type: 'content',
+  schema: z.object({
+    title:       z.string(),
+    description: z.string(),
+    pubDate:     z.coerce.date(),
+    tags:        z.array(z.string()).optional().default([]),
+    day:         z.number().optional(),   // Day 1, Day 2, etc.
+    draft:       z.boolean().optional().default(false),
+  }),
+});
+
+export const collections = { blog };