how the dlibre cluster is put together

part 1 of a draft series on the single-node k3s cluster behind dlibre.com, from bootstrap to ingress to the blog workload itself.

this post is the first draft in a short series about the cluster behind dlibre.com. the goal is not to write a generic kubernetes explainer. it is to document the actual shape of this cluster, why it is set up this way, and what happens when someone opens this blog in a browser.

the current setup is intentionally simple:

  • one oci vm
  • one k3s server
  • argocd as the gitops controller
  • traefik as ingress
  • metallb for the service ip
  • external-dns for dns records
  • cert-manager for certificates
  • longhorn for persistent storage
  • dex for oidc on the admin side

that means this is not pretending to be a highly available multi-node platform. it is a small, reproducible cluster that can be rebuilt from git, and that tradeoff matters for understanding the rest of the design.

the base machine

the cluster starts as a single server with k3s installed from the checked-in config in the cluster repo.

that config does a few important things up front:

  • it adds dlibre, dlibre.com, and the public ip as tls sans for the api server
  • it disables local-storage because longhorn is used for persistent volumes instead
  • it disables servicelb because ingress exposure is handled with metallb and traefik
  • it wires the kubernetes api server to dex for oidc auth

so the machine is not just “running kubernetes.” it is already opinionated toward gitops, external ingress, and cluster-wide login through dex.

bootstrap and gitops

after k3s is up, the cluster is bootstrapped with a small argocd seed from argocd/bootstrap.

the bootstrap step does three things:

  1. installs argocd itself
  2. adds the private github repo as an argocd repository using an ssh deploy key
  3. creates the root application that points at argocd/apps

from that point on, the cluster converges from git. that is the key design choice in this setup.

instead of manually installing components one by one, argocd reads the repo and continuously applies the desired state. if traefik, dex, external-dns, or the dlibre site drift away from what the repo says, argocd pulls them back.

the practical consequence is that rebuilding the cluster is mostly a matter of:

  • reinstalling k3s
  • restoring the sealed-secrets keys
  • bootstrapping argocd
  • waiting for applications to sync

the platform services

once the root application is active, argocd installs the cluster building blocks in waves.

dex

dex is the oidc issuer for the admin surface. argocd, the kubernetes api, kubernetes dashboard, and longhorn trust dex. google oauth is the upstream identity provider, and the cluster-admins@dlibre.com group is used for authorization.

this is mostly about operator access, not end-user auth for public services.

metallb

because this is not running behind a cloud kubernetes load balancer, metallb advertises the ingress address inside the cluster network. right now the pool is deliberately tiny: it is a single public ip, 144.24.140.57/32.

that fits the current design. there is one node, one ingress tier, and one public edge.

traefik

traefik is installed as a LoadBalancer service. metallb gives that service the public ingress ip, and traefik then becomes the entry point for http and https traffic.

traefik is the component that turns hostname-based routing into actual backend service selection inside the cluster.

external-dns

external-dns watches dns resources in the cluster and writes records to cloudflare. that means dns is also described in git and reconciled from the cluster side, instead of being managed manually in a dashboard.

for dlibre.com, there is a DNSEndpoint that publishes an A record to the ingress ip. for some other services, records can point at the shared traefik ingress hostname instead.

cert-manager

cert-manager handles tls using a letsencrypt cluster issuer with cloudflare dns challenges. there is a wildcard certificate for:

  • dlibre.com
  • *.dlibre.com

the certificate secret is reflected across namespaces, so workloads can reuse the same wildcard cert without manually duplicating tls material in each app namespace.

longhorn and postgres

longhorn is the persistent storage layer for stateful workloads. postgres exists as shared platform data infrastructure for services like dex. this matters less for the blog itself, but it explains why the cluster is set up to support more than static sites.

what happens when someone opens dlibre.com

this is the simplest way to think about the request path:

  1. a browser resolves dlibre.com
  2. cloudflare returns the public ip managed by the cluster dns setup
  3. that ip lands on traefik’s LoadBalancer service
  4. traefik matches the Host header for dlibre.com
  5. traefik forwards the request to the dlibre service in the dlibre namespace
  6. the pod serves the static site over port 80

the tls certificate presented at the edge is the wildcard cert issued by cert-manager and made available to the app through the cluster certificate setup.

how this blog is deployed

the blog itself is not built inside the cluster repo. the cluster repo only declares how the already-built site should run.

the actual site deployment is managed by the dlibre argocd application, which pulls a helm chart from https://dlibre.github.io/charts and fills it with the checked-in values from the cluster repo.

those values tell argocd to deploy:

  • one replica
  • image ghcr.io/dlibre/dlibre.com
  • a pinned image tag
  • an image pull secret for ghcr
  • a clusterip service on port 80
  • an ingress for dlibre.com
  • tls using letsencrypt-wildcard-cert-dlibre.com

so the cluster repo is the runtime contract for the site, not the site source itself.

inside this repo, the site builds to static output and the container image is very small: the generated files are copied into an nginx:alpine image and served from there. that gives the deployment a straightforward runtime model:

  • build static files
  • publish image to ghcr
  • update the deployed image tag in the cluster repo
  • let argocd reconcile the new version

why this shape works well

for a small personal platform, this setup buys a lot without requiring a very large control plane:

  • the entire cluster can be reasoned about from git
  • ingress, dns, and tls are part of the same declarative workflow
  • admin auth is centralized
  • stateful services can be added without redesigning the whole stack
  • static sites like this one remain cheap to run

the obvious tradeoff is failure domain. one server still means one server. if the vm is down, the cluster is down. if the ingress node is down, everything public is down. none of that is hidden by kubernetes.

what i want to document next

this is the overview post. the next drafts in the series should probably split the details into smaller pieces:

  • how argocd bootstraps the cluster and keeps it converged
  • how ingress, dns, and wildcard tls fit together
  • how app delivery works from this repo to ghcr to the running pod

that breakdown is probably easier to maintain than one giant post, especially as the cluster grows beyond the current single-node design.