RC: (upload) astro initial structure
This commit is contained in:
63
src/content/blog/day-001-why-i-built-a-homelab.md
Normal file
63
src/content/blog/day-001-why-i-built-a-homelab.md
Normal file
@@ -0,0 +1,63 @@
|
||||
---
|
||||
title: "Day 1 — Why I Built a Homelab (and Why You Probably Should Too)"
|
||||
description: "Where it all starts. The hardware decisions, the philosophy, and the dangerous moment I decided a Proxmox cluster was a \"good idea\"."
|
||||
pubDate: 2024-11-01
|
||||
day: 1
|
||||
tags: ["homelab", "proxmox", "hardware", "why"]
|
||||
---
|
||||
|
||||
## The Itch
|
||||
|
||||
Every sysadmin eventually gets the itch. You spend your days wrangling production HPC clusters — ARCHER2 nodes, Cirrus compute, InfiniBand fabric — and you come home and think: *I want something like this, but mine*.
|
||||
|
||||
Not a VM on a cloud provider. Not a Raspberry Pi running Pi-hole. A proper stack — compute, storage, networking — that I control end to end, where breaking things at 2am is a learning experience rather than a P1 incident.
|
||||
|
||||
This is Day 1 of that journey.
|
||||
|
||||
## The Philosophy First
|
||||
|
||||
Before buying a single piece of hardware, I wrote down what I actually wanted from a homelab:
|
||||
|
||||
- **Failure should be cheap, not catastrophic.** Real redundancy, snapshots, backups.
|
||||
- **Everything as code.** If I can't reproduce the setup from a Git repo, it doesn't exist.
|
||||
- **Dogfood the tools I use at work.** Ansible, Terraform, monitoring stacks — use them at home so they're muscle memory at EPCC.
|
||||
- **No cloud vendor lock-in.** Self-hosted or bust.
|
||||
|
||||
## Hardware — What I Picked and Why
|
||||
|
||||
### Compute
|
||||
|
||||
I went with a used server chassis from eBay rather than consumer NUCs. The reasoning was simple: ECC RAM, IPMI, and PCIe slots worth having. The power draw penalty is real, but I'm not running this 24/7 at full load.
|
||||
|
||||
> The most expensive homelab decision is the one you make twice. Buy the thing that gives you headroom.
|
||||
|
||||
### Storage
|
||||
|
||||
ZFS from day one. I'd seen enough production storage corruption at work to know that data integrity checksums aren't optional. I spec'd the pool conservatively — RAIDZ2 across four drives — knowing I could expand later when ZFS 2.2's RAIDZ expansion landed properly.
|
||||
|
||||
Pool name: `zfs-oporto`. Obviously.
|
||||
|
||||
### Networking
|
||||
|
||||
OPNsense on a dedicated box. No consumer router running some vendor's locked-down firmware between me and the internet. VLANs from the start, even when it felt like overkill — because it always feels like overkill until it isn't.
|
||||
|
||||
## The Software Stack (Day 1 Vision)
|
||||
|
||||
```
|
||||
[ OPNsense ] ← router/firewall/VLANs
|
||||
|
|
||||
[ Proxmox VE ] ← hypervisor
|
||||
├── Kubernetes cluster (k3s initially, then full k8s)
|
||||
├── NAS VM → ZFS pool
|
||||
└── Utility VMs (monitoring, DNS, etc.)
|
||||
```
|
||||
|
||||
The monitoring story was left deliberately vague on Day 1. Spoiler: it took many iterations and a lot of wrong turns before landing on Netdata → VictoriaMetrics → Grafana.
|
||||
|
||||
## What Actually Happened
|
||||
|
||||
I had everything racked and powered by midnight. The first Proxmox install took 20 minutes. The first networking misconfiguration locked me out of the management interface for two hours.
|
||||
|
||||
Standard.
|
||||
|
||||
See you on [Day 2 →](/blog/day-002-proxmox-cluster-and-first-vms) where we get Proxmox clustered and first VMs running.
|
||||
39
src/content/blog/day-002-proxmox-cluster-and-first-vms.md
Normal file
39
src/content/blog/day-002-proxmox-cluster-and-first-vms.md
Normal file
@@ -0,0 +1,39 @@
|
||||
---
|
||||
title: "Day 2 — Proxmox Cluster, VLANs, and the First Real Mistake"
|
||||
description: "Getting Proxmox VE into a proper cluster, carving VLANs in OPNsense, and the routing loop that ate an hour of my evening."
|
||||
pubDate: 2024-11-08
|
||||
day: 2
|
||||
tags: ["proxmox", "networking", "opnsense", "vlans"]
|
||||
---
|
||||
|
||||
## Proxmox Cluster
|
||||
|
||||
With two nodes up, it was time to cluster them. Proxmox's cluster setup is deceptively straightforward in the happy path. The catch is Corosync — it's very opinionated about network latency and quorum, and if you misconfigure which interface carries cluster traffic, you will have a bad time.
|
||||
|
||||
```bash
|
||||
# On node 1:
|
||||
pvecm create homelab-cluster
|
||||
|
||||
# On node 2:
|
||||
pvecm add <node1-ip>
|
||||
```
|
||||
|
||||
## VLAN Design
|
||||
|
||||
I settled on a simple scheme early and stuck to it:
|
||||
|
||||
| VLAN | Purpose | Subnet |
|
||||
|------|--------------------------|-----------------|
|
||||
| 10 | Management (IPMI, etc.) | 10.0.10.0/24 |
|
||||
| 20 | Proxmox hosts | 10.0.20.0/24 |
|
||||
| 30 | Kubernetes pods | 10.0.30.0/24 |
|
||||
| 40 | Media / untrusted VMs | 10.0.40.0/24 |
|
||||
| 50 | IoT | 10.0.50.0/24 |
|
||||
|
||||
The management VLAN is firewalled hard — nothing from VLAN 40 or 50 touches it.
|
||||
|
||||
## The Mistake
|
||||
|
||||
I fat-fingered a firewall rule in OPNsense that accidentally allowed VLAN 40 to reach the Proxmox management interface. I only noticed because I was testing my GeoIP block rules and a route showed up that shouldn't have existed.
|
||||
|
||||
Lesson: always add a deny-all rule at the bottom of every VLAN's outbound chain, even when it feels redundant. Explicit beats implicit.
|
||||
63
src/content/blog/day-003-kubernetes-flux-gitops.md
Normal file
63
src/content/blog/day-003-kubernetes-flux-gitops.md
Normal file
@@ -0,0 +1,63 @@
|
||||
---
|
||||
title: "Day 3 — Kubernetes on Bare Metal, Flux GitOps, and Why I Stopped Using k3s"
|
||||
description: "Graduating from k3s to full kubeadm, setting up Flux CD for GitOps, and the first taste of what Longhorn storage actually means."
|
||||
pubDate: 2024-11-15
|
||||
day: 3
|
||||
tags: ["kubernetes", "flux", "gitops", "longhorn", "k8s"]
|
||||
---
|
||||
|
||||
## Why Not k3s Forever?
|
||||
|
||||
k3s is excellent. I used it for two months and it worked fine. I moved to full Kubernetes (kubeadm) for one reason: I wanted the experience to transfer directly to production environments. At work we don't run k3s. The extra complexity of kubeadm is the point.
|
||||
|
||||
## The Install
|
||||
|
||||
```bash
|
||||
# kubeadm init on control plane
|
||||
sudo kubeadm init \
|
||||
--pod-network-cidr=10.244.0.0/16 \
|
||||
--control-plane-endpoint="k8s-control.int.h0melab.uk"
|
||||
|
||||
# CNI — went with Cilium over Flannel for eBPF goodness
|
||||
helm install cilium cilium/cilium --namespace kube-system
|
||||
```
|
||||
|
||||
## Flux GitOps
|
||||
|
||||
This was the decision that changed everything. Instead of `kubectl apply`-ing manifests, every change goes through Git:
|
||||
|
||||
```
|
||||
homelab-k8s/
|
||||
├── clusters/homelab/
|
||||
│ ├── flux-system/ ← Flux's own manifests
|
||||
│ ├── infrastructure/ ← Traefik, Longhorn, cert-manager
|
||||
│ └── apps/ ← Actual workloads
|
||||
```
|
||||
|
||||
The golden rule I established here: **Chart.yaml version bumps are required for Flux to pick up Helm chart changes.** Forgot this approximately 15 times before it became instinct.
|
||||
|
||||
## Longhorn
|
||||
|
||||
Distributed block storage across three worker nodes. The UI is surprisingly good. The first time I watched a volume replica heal itself after a node reboot, I understood why people write blog posts about storage.
|
||||
|
||||
```yaml
|
||||
# The PVC pattern I use for everything
|
||||
apiVersion: v1
|
||||
kind: PersistentVolumeClaim
|
||||
metadata:
|
||||
name: app-data
|
||||
annotations:
|
||||
helm.sh/resource-policy: keep # ← never delete this on helm uninstall
|
||||
spec:
|
||||
storageClassName: longhorn
|
||||
accessModes: [ReadWriteOnce]
|
||||
resources:
|
||||
requests:
|
||||
storage: 10Gi
|
||||
```
|
||||
|
||||
The `helm.sh/resource-policy: keep` annotation saved my data at least twice when I was iterating on Helm releases.
|
||||
|
||||
## What's Next
|
||||
|
||||
Day 4 covers the monitoring stack — Netdata agents, VictoriaMetrics as the TSDB, and getting Grafana to look like something I'd actually want to stare at during an incident.
|
||||
15
src/content/config.ts
Normal file
15
src/content/config.ts
Normal file
@@ -0,0 +1,15 @@
|
||||
import { defineCollection, z } from 'astro:content';
|
||||
|
||||
const blog = defineCollection({
|
||||
type: 'content',
|
||||
schema: z.object({
|
||||
title: z.string(),
|
||||
description: z.string(),
|
||||
pubDate: z.coerce.date(),
|
||||
tags: z.array(z.string()).optional().default([]),
|
||||
day: z.number().optional(), // Day 1, Day 2, etc.
|
||||
draft: z.boolean().optional().default(false),
|
||||
}),
|
||||
});
|
||||
|
||||
export const collections = { blog };
|
||||
Reference in New Issue
Block a user