RC: (upload) astro initial structure

This commit is contained in:
Raul Costa
2026-04-01 00:19:49 +01:00
commit 8c11192e7b
29 changed files with 8561 additions and 0 deletions

View File

@@ -0,0 +1,63 @@
---
title: "Day 1 — Why I Built a Homelab (and Why You Probably Should Too)"
description: "Where it all starts. The hardware decisions, the philosophy, and the dangerous moment I decided a Proxmox cluster was a \"good idea\"."
pubDate: 2024-11-01
day: 1
tags: ["homelab", "proxmox", "hardware", "why"]
---
## The Itch
Every sysadmin eventually gets the itch. You spend your days wrangling production HPC clusters — ARCHER2 nodes, Cirrus compute, InfiniBand fabric — and you come home and think: *I want something like this, but mine*.
Not a VM on a cloud provider. Not a Raspberry Pi running Pi-hole. A proper stack — compute, storage, networking — that I control end to end, where breaking things at 2am is a learning experience rather than a P1 incident.
This is Day 1 of that journey.
## The Philosophy First
Before buying a single piece of hardware, I wrote down what I actually wanted from a homelab:
- **Failure should be cheap, not catastrophic.** Real redundancy, snapshots, backups.
- **Everything as code.** If I can't reproduce the setup from a Git repo, it doesn't exist.
- **Dogfood the tools I use at work.** Ansible, Terraform, monitoring stacks — use them at home so they're muscle memory at EPCC.
- **No cloud vendor lock-in.** Self-hosted or bust.
## Hardware — What I Picked and Why
### Compute
I went with a used server chassis from eBay rather than consumer NUCs. The reasoning was simple: ECC RAM, IPMI, and PCIe slots worth having. The power draw penalty is real, but I'm not running this 24/7 at full load.
> The most expensive homelab decision is the one you make twice. Buy the thing that gives you headroom.
### Storage
ZFS from day one. I'd seen enough production storage corruption at work to know that data integrity checksums aren't optional. I spec'd the pool conservatively — RAIDZ2 across four drives — knowing I could expand later when ZFS 2.2's RAIDZ expansion landed properly.
Pool name: `zfs-oporto`. Obviously.
### Networking
OPNsense on a dedicated box. No consumer router running some vendor's locked-down firmware between me and the internet. VLANs from the start, even when it felt like overkill — because it always feels like overkill until it isn't.
## The Software Stack (Day 1 Vision)
```
[ OPNsense ] ← router/firewall/VLANs
|
[ Proxmox VE ] ← hypervisor
├── Kubernetes cluster (k3s initially, then full k8s)
├── NAS VM → ZFS pool
└── Utility VMs (monitoring, DNS, etc.)
```
The monitoring story was left deliberately vague on Day 1. Spoiler: it took many iterations and a lot of wrong turns before landing on Netdata → VictoriaMetrics → Grafana.
## What Actually Happened
I had everything racked and powered by midnight. The first Proxmox install took 20 minutes. The first networking misconfiguration locked me out of the management interface for two hours.
Standard.
See you on [Day 2 →](/blog/day-002-proxmox-cluster-and-first-vms) where we get Proxmox clustered and first VMs running.

View File

@@ -0,0 +1,39 @@
---
title: "Day 2 — Proxmox Cluster, VLANs, and the First Real Mistake"
description: "Getting Proxmox VE into a proper cluster, carving VLANs in OPNsense, and the routing loop that ate an hour of my evening."
pubDate: 2024-11-08
day: 2
tags: ["proxmox", "networking", "opnsense", "vlans"]
---
## Proxmox Cluster
With two nodes up, it was time to cluster them. Proxmox's cluster setup is deceptively straightforward in the happy path. The catch is Corosync — it's very opinionated about network latency and quorum, and if you misconfigure which interface carries cluster traffic, you will have a bad time.
```bash
# On node 1:
pvecm create homelab-cluster
# On node 2:
pvecm add <node1-ip>
```
## VLAN Design
I settled on a simple scheme early and stuck to it:
| VLAN | Purpose | Subnet |
|------|--------------------------|-----------------|
| 10 | Management (IPMI, etc.) | 10.0.10.0/24 |
| 20 | Proxmox hosts | 10.0.20.0/24 |
| 30 | Kubernetes pods | 10.0.30.0/24 |
| 40 | Media / untrusted VMs | 10.0.40.0/24 |
| 50 | IoT | 10.0.50.0/24 |
The management VLAN is firewalled hard — nothing from VLAN 40 or 50 touches it.
## The Mistake
I fat-fingered a firewall rule in OPNsense that accidentally allowed VLAN 40 to reach the Proxmox management interface. I only noticed because I was testing my GeoIP block rules and a route showed up that shouldn't have existed.
Lesson: always add a deny-all rule at the bottom of every VLAN's outbound chain, even when it feels redundant. Explicit beats implicit.

View File

@@ -0,0 +1,63 @@
---
title: "Day 3 — Kubernetes on Bare Metal, Flux GitOps, and Why I Stopped Using k3s"
description: "Graduating from k3s to full kubeadm, setting up Flux CD for GitOps, and the first taste of what Longhorn storage actually means."
pubDate: 2024-11-15
day: 3
tags: ["kubernetes", "flux", "gitops", "longhorn", "k8s"]
---
## Why Not k3s Forever?
k3s is excellent. I used it for two months and it worked fine. I moved to full Kubernetes (kubeadm) for one reason: I wanted the experience to transfer directly to production environments. At work we don't run k3s. The extra complexity of kubeadm is the point.
## The Install
```bash
# kubeadm init on control plane
sudo kubeadm init \
--pod-network-cidr=10.244.0.0/16 \
--control-plane-endpoint="k8s-control.int.h0melab.uk"
# CNI — went with Cilium over Flannel for eBPF goodness
helm install cilium cilium/cilium --namespace kube-system
```
## Flux GitOps
This was the decision that changed everything. Instead of `kubectl apply`-ing manifests, every change goes through Git:
```
homelab-k8s/
├── clusters/homelab/
│ ├── flux-system/ ← Flux's own manifests
│ ├── infrastructure/ ← Traefik, Longhorn, cert-manager
│ └── apps/ ← Actual workloads
```
The golden rule I established here: **Chart.yaml version bumps are required for Flux to pick up Helm chart changes.** Forgot this approximately 15 times before it became instinct.
## Longhorn
Distributed block storage across three worker nodes. The UI is surprisingly good. The first time I watched a volume replica heal itself after a node reboot, I understood why people write blog posts about storage.
```yaml
# The PVC pattern I use for everything
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: app-data
annotations:
helm.sh/resource-policy: keep # ← never delete this on helm uninstall
spec:
storageClassName: longhorn
accessModes: [ReadWriteOnce]
resources:
requests:
storage: 10Gi
```
The `helm.sh/resource-policy: keep` annotation saved my data at least twice when I was iterating on Helm releases.
## What's Next
Day 4 covers the monitoring stack — Netdata agents, VictoriaMetrics as the TSDB, and getting Grafana to look like something I'd actually want to stare at during an incident.

15
src/content/config.ts Normal file
View File

@@ -0,0 +1,15 @@
import { defineCollection, z } from 'astro:content';
const blog = defineCollection({
type: 'content',
schema: z.object({
title: z.string(),
description: z.string(),
pubDate: z.coerce.date(),
tags: z.array(z.string()).optional().default([]),
day: z.number().optional(), // Day 1, Day 2, etc.
draft: z.boolean().optional().default(false),
}),
});
export const collections = { blog };