Building a Home Lab - Part One

homelab

kubernetes

ansible

2025-03-19T17:07:31Z

I had a goal to create an inexpensive Kubernetes cluster on bare metal in my home lab for 2025. The budget I have for this home lab means sticking with the most machine I can get for under $200 per node. That's how I landed on the decision to purchase the Beelink Mini S Pro Mini PC, N100 Processor, 16G DDR4 RAM, 500G SSD.

I bought one to be the control-plane, and two to be worker nodes. Although not necessary to get a kubernetes cluster going, I bought a fourth that I have some future plans for to run some additional services on-prem.

I chose Ubuntu Server 24.04 for the server OS on the nodes. Ansible will be used for some basic configuration but for the install of Kubernetes and Calico, I chose to do it all by hand using kubeadm.

Ansible

Creating the ansible user on the homelab nodes

I'm going to create a user named ansible on each of the nodes that Ansible uses when we run tasks.

Before I do that though, I need to create an SSH key for my own account and copy it to the authorized_keys file for my account on each of the nodes.

ssh-keygen -t rsa -b 4096 -f ~/.ssh/jgn_rsa -N ""

The public key can be added for my account on each node with a simple loop like this.

for node in 192.168.1.{100..103}; do
  ssh-copy-id -i ~/.ssh/jgn_rsa.pub jgn@$node
done

I created a playbook to run against all nodes in the cluster to create the ansible user and give it sudo privileges that allow it run all commands without a password.

---
- name: Set up ansible user and sudo privileges
  hosts: homelab
  become: yes
  vars:
    ansible_password: "AnsibleUserPasswordHere"
  tasks:
    - name: Create ansible user with password and home directory
      ansible.builtin.user:
        name: ansible
        password: "{{ ansible_password | password_hash('sha512') }}"
        shell: /bin/bash
        create_home: yes
        state: present

    - name: Grant ansible user sudo privileges via sudoers.d
      ansible.builtin.copy:
        content: "ansible ALL=(ALL) NOPASSWD:ALL" # Grant ansible ability to run any command with no password
        dest: /etc/sudoers.d/ansible
        mode: "0440"
        owner: root
        group: root
        validate: "/usr/sbin/visudo -cf %s"

I will have to run this playbook as myself, but going forward running playbooks will use the ansible user.

ansible-playbook -i inventory users/ansible.yml --user jgn --ask-pass

Now that the ansible user is on all the nodes with sudo permissions that allow it to run commands we can proceed with making Ansible a little nicer to use in the home lab.

Completing the Ansible configuration

I added the four nodes in my home lab to my /etc/hosts file (Linux/Unix, MacOS) so that instead of typing 192.168.1.100 the hostname can be used instead. If you are on Windows this file is located at c:\windows\system32\drivers\etc\hosts.

##
# Host Database
#
# localhost is used to configure the loopback interface
# when the system is booting.  Do not change this entry.
##
127.0.0.1       localhost
255.255.255.255 broadcasthost
::1             localhost

# Home lab hosts
192.168.1.100 docbar
192.168.1.101 steeldust
192.168.1.102 reride
192.168.1.103 roughstock

Next we need an inventory file so we that Ansible knows what we mean when we address the nodes by their group name (e.g. "homelab") and it lets us specify a few other things for convenience. I just called mine inventory and it is found at the top level directory where all my Ansible playbooks are located at ~/projects/homelab/ansible

[homelab]
docbar ansible_host=192.168.1.100
steeldust ansible_host=192.168.1.101
reride ansible_host=192.168.1.102
roughstock ansible_host=192.168.1.103

[all:vars]
ansible_python_interpreter=/usr/bin/python3.12
ansible_ssh_private_key_file=~/.ssh/ansible_rsa

[admin]
docbar

[kubernetes]
steeldust
reride
roughstock

[contol-plane]
steeldust

[workers]
reride
roughstock

Now any playbook can target the nodes homelab, admin, kubernetes, control-plane, or workers and Ansible will know what machines we are referring to.

I'm also going to create an ansible.cfg which lets me specify some defaults. This file will be located at the top level directory with the inventory file at ~/projects/homelab/ansible. This lets me do less typing since the remote_user, inventory file, and private key are already specified.

[defaults]
inventory = ./inventory
remote_user = ansible
private_key_file = ~/.ssh/ansible_rsa
host_key_checking = false

Similar to what I did above, now I need to create an SSH key for the ansible user.

ssh-keygen -t rsa -b 4096 -f ~/.ssh/ansible_rsa -N ""

And copy then I copy the public key to each node.

for node in 192.168.1.{100..103}; do
  ssh-copy-id -i ~/.ssh/ansible_rsa.pub ansible@$node
done

Let's test it with a ping and make sure every node in the homelab responds with no warnings or errors.

ansible -m ping all
roughstock | SUCCESS => {
    "changed": false,
    "ping": "pong"
}
reride | SUCCESS => {
    "changed": false,
    "ping": "pong"
}
steeldust | SUCCESS => {
    "changed": false,
    "ping": "pong"
}
docbar | SUCCESS => {
    "changed": false,
    "ping": "pong"
}

Creating an update playbook

This playbook will run the equivalent of logging into each node in the homelab and doing sudo apt update -y && sudo apt upgrade -y. I created this playbook at ~/projects/homelab/ansible/update/all.yml

---
- name: Update and patch Ubuntu nodes
  hosts: homelab
  become: yes
  tasks:
    - name: Update package cache
      ansible.builtin.apt:
        update_cache: yes
      register: update_result

    - name: Upgrade all packages to the latest version
      ansible.builtin.apt:
        upgrade: dist
      when: update_result.changed # Only run if cache was updated

    - name: Check if a reboot is required
      ansible.builtin.stat:
        path: /var/run/reboot-required
      register: reboot_required

    - name: Reboot the node if required
      ansible.builtin.reboot:
        reboot_timeout: 300
      when: reboot_required.stat.exists

Now updating all hosts in the homelab is done with this playbook. How convenient!

ansible-playbook update/all.yml

Configure Kubernetes pre-requisites

I'm going to stand up the cluster by hand using kubeadm with calico for the CNI.

There are a few pre-requisites that we need to handle before running kubeadm init.

Since this needs to be done on all three nodes that will make up the cluster, it is another good candidate for an Ansible playbook. In the playbook I install required packages, disable swap, enable kernel networking drivers, install containerd and get the correct packages for Kubernetes version 1.32.

---
- name: Install Kubernetes pre-requisites
  hosts: kubernetes
  become: yes
  tasks:
    # Update package cache to ensure we can install prerequisites
    - name: Update package cache
      ansible.builtin.apt:
        update_cache: yes
        cache_valid_time: 3600

    # Install basic tools required for adding the Kubernetes repo
    - name: Install required packages
      ansible.builtin.apt:
        name:
          - apt-transport-https
          - ca-certificates
          - curl
          - gpg
        state: present

    # Disable swap as Kubernetes requires it off
    - name: Disable swap
      ansible.builtin.command: swapoff -a
      changed_when: true

    # Remove swap from fstab to prevent re-enabling on reboot
    - name: Remove swap entry from /etc/fstab
      ansible.builtin.lineinfile:
        path: /etc/fstab
        regexp: '^.*\sswap\s'
        state: absent

    # Load kernel modules for container networking
    - name: Load required kernel modules
      ansible.builtin.modprobe:
        name: "{{ item }}"
        state: present
      loop:
        - overlay
        - br_netfilter

    # Persist kernel modules for boot
    - name: Persist kernel modules
      ansible.builtin.lineinfile:
        path: /etc/modules-load.d/k8s.conf
        line: "{{ item }}"
        create: yes
        mode: "0644"
      loop:
        - overlay
        - br_netfilter

    # Configure sysctl settings for Kubernetes networking
    - name: Set sysctl parameters for Kubernetes
      ansible.builtin.sysctl:
        name: "{{ item.name }}"
        value: "{{ item.value }}"
        state: present
        sysctl_file: /etc/sysctl.d/k8s.conf
        reload: yes
      loop:
        - { name: "net.bridge.bridge-nf-call-iptables", value: "1" }
        - { name: "net.bridge.bridge-nf-call-ip6tables", value: "1" }
        - { name: "net.ipv4.ip_forward", value: "1" }

    # Install containerd as the container runtime
    - name: Install containerd
      ansible.builtin.apt:
        name: containerd
        state: present

    # Configure containerd to use systemd cgroup driver
    - name: Ensure containerd config directory exists
      ansible.builtin.file:
        path: /etc/containerd
        state: directory
        mode: "0755"

    # Generate and configure containerd config.toml
    - name: Generate and configure containerd config.toml for systemd
      ansible.builtin.shell:
        cmd: "containerd config default > /etc/containerd/config.toml"
        creates: /etc/containerd/config.toml
      register: config_generated
      changed_when: config_generated.rc == 0

    # Set systemdCgroup = true in containerd config
    - name: Ensure containerd is using systemdCgroup
      ansible.builtin.lineinfile:
        path: /etc/containerd/config.toml
        regexp: '^(\s*)systemd_cgroup\s*=\s*false'
        line: '\1systemd_cgroup = true'
        backrefs: yes
      notify: Restart containerd

    # Restart containerd to apply changes
    - name: Restart containerd
      ansible.builtin.systemd:
        name: containerd
        state: restarted
        enabled: yes

    # Run the curl | gpg command from the Kubernetes docs
    - name: Add Kubernetes APT key with curl and gpg
      ansible.builtin.shell:
        cmd: "curl -fsSL https://pkgs.k8s.io/core:/stable:/v1.32/deb/Release.key | gpg --dearmor -o /etc/apt/keyrings/kubernetes-apt-keyring.gpg"
        creates: /etc/apt/keyrings/kubernetes-apt-keyring.gpg
      become: yes
      changed_when: true

    # Set permissions on the keyring file
    - name: Set keyring file permissions
      ansible.builtin.file:
        path: /etc/apt/keyrings/kubernetes-apt-keyring.gpg
        mode: "0644"

    # Add the Kubernetes APT repository exactly as in the docs
    - name: Add Kubernetes APT repository
      ansible.builtin.shell:
        cmd: "echo 'deb [signed-by=/etc/apt/keyrings/kubernetes-apt-keyring.gpg] https://pkgs.k8s.io/core:/stable:/v1.32/deb/ /' | tee /etc/apt/sources.list.d/kubernetes.list"
        creates: /etc/apt/sources.list.d/kubernetes.list
      changed_when: true

    # Update APT cache after adding the repo
    - name: Update APT cache after adding Kubernetes repo
      ansible.builtin.apt:
        update_cache: yes

    # Install Kubernetes 1.32 packages
    - name: Install Kubernetes packages
      ansible.builtin.apt:
        name:
          - kubelet=1.32.0-*
          - kubeadm=1.32.0-*
          - kubectl=1.32.0-*
        state: present

    # Hold Kubernetes packages to prevent unwanted upgrades
    - name: Hold Kubernetes packages at current version
      ansible.builtin.dpkg_selections:
        name: "{{ item }}"
        selection: hold
      loop:
        - kubelet
        - kubeadm
        - kubectl

    # Reboot to apply all changes
    - name: Reboot nodes to apply changes
      ansible.builtin.reboot:
        reboot_timeout: 300

All I need to do now to get these hosts ready to do a manual installation with kubeadm is just run the playbook.

ansible-playbook kubernetes/install_dependencies.yml

Running kubeadm to initialize the cluster

sudo kubeadm init --pod-network-cidr=10.244.0.0/16 --control-plane-endpoint=192.168.1.101 --v=5

Uh oh... it isn't working!?

[preflight] Pulling images required for setting up a Kubernetes cluster
[preflight] This might take a minute or two, depending on the speed of your internet connection
[preflight] You can also perform this action beforehand using 'kubeadm config images pull'
[preflight] Some fatal errors occurred:
failed to create new CRI runtime service: validate service connection: validate CRI v1 runtime API for endpoint "unix:///var/run/containerd/containerd.sock": rpc error: code = Unimplemented desc = unknown service runtime.v1.RuntimeService[preflight] If you know what you are doing, you can make a check non-fatal with `--ignore-preflight-errors=...`
error execution phase preflight
k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow.(*Runner).Run.func1
        k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow/runner.go:262
k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow.(*Runner).visitAll
        k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow/runner.go:450
k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow.(*Runner).Run
        k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow/runner.go:234
k8s.io/kubernetes/cmd/kubeadm/app/cmd.newCmdInit.func1
        k8s.io/kubernetes/cmd/kubeadm/app/cmd/init.go:129
github.com/spf13/cobra.(*Command).execute
        github.com/spf13/cobra@v1.8.1/command.go:985
github.com/spf13/cobra.(*Command).ExecuteC
        github.com/spf13/cobra@v1.8.1/command.go:1117
github.com/spf13/cobra.(*Command).Execute
        github.com/spf13/cobra@v1.8.1/command.go:1041
k8s.io/kubernetes/cmd/kubeadm/app.Run
        k8s.io/kubernetes/cmd/kubeadm/app/kubeadm.go:47
main.main
        k8s.io/kubernetes/cmd/kubeadm/kubeadm.go:25
runtime.main
        runtime/proc.go:272
runtime.goexit
        runtime/asm_amd64.s:1700

To make a long story short, the containerd package that comes with Ubuntu Server 24.04 doesn't seem to have CRI enabled and after trying everything I could think of ended up ripping and replacing it with another package containerd.io provided by Docker.

Installing containerd.io

First I need to remove the version provided by Ubuntu since we couldn't get it working.

sudo apt remove --purge containerd -y

Next I'll grab the gpg key from Docker and add it to the apt repository.

sudo curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmour -o /etc/apt/trusted.gpg.d/docker.gpg
sudo add-apt-repository "deb [arch=amd64] https://download.docker.com/linux/ubuntu $(lsb_release -cs) stable"

An install the containerd.io package.

sudo apt install -y containerd.io

There are a couple of changes to config.toml I need to make so that it will work correctly. I saved the default config to disk as /etc/containerd/config.toml so I can make my edits.

containerd config default | sudo tee /etc/containerd/config.toml

The main thing we need to do is enable SystemdCgroup by changing false to true.

sudo sed -i 's/SystemdCgroup \= false/SystemdCgroup \= true/g' /etc/containerd/config.toml

Trying kubeadm init again

sudo kubeadm init --pod-network-cidr=10.244.0.0/16 --control-plane-endpoint=192.168.1.101 --v=5

SUCCESS!

Adding Calico CNI

Installing Calico on the cluster is pretty straight forward, we run two kubectl create -f commands with supplied yaml files.

kubectl create -f https://raw.githubusercontent.com/projectcalico/calico/v3.29.2/manifests/tigera-operator.yaml
kubectl create -f https://raw.githubusercontent.com/projectcalico/calico/v3.29.2/manifests/custom-resources.yaml

Setting up kube config

mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config

Hmmm... something isn't quite right yet

The cluster is doing okay, but the calico components and coredns are not happy.

It turns out that I used the wrong CIDR when I ran kubeadm init for Calico. I used 10.244.0.0 and the CIDR that Calico expects is 192.168.0.0/16.

To fix this I need to download the yaml file provided by Calico and edit it to use the CIDR that I specified when I created the cluster.

curl https://raw.githubusercontent.com/projectcalico/calico/v3.29.2/manifests/custom-resources.yaml > calico.yaml

Now I'll just update the cidr: block with the correct CIDR for this cluster.

# This section includes base Calico installation configuration.
# For more information, see: https://docs.tigera.io/calico/latest/reference/installation/api#operator.tigera.io/v1.Installation
apiVersion: operator.tigera.io/v1
kind: Installation
metadata:
  name: default
spec:
  # Configures Calico networking.
  calicoNetwork:
    ipPools:
    - name: default-ipv4-ippool
      blockSize: 26
      cidr: 10.244.0.0/16
      encapsulation: VXLANCrossSubnet
      natOutgoing: Enabled
      nodeSelector: all()

---

# This section configures the Calico API server.
# For more information, see: https://docs.tigera.io/calico/latest/reference/installation/api#operator.tigera.io/v1.APIServer
apiVersion: operator.tigera.io/v1
kind: APIServer
metadata:
  name: default
spec: {}

... and apply.

kubectl apply -f calico.yaml

Checking the nodes and pods.

jgn@steeldust:~$ kubectl get nodes
NAME         STATUS   ROLES           AGE     VERSION
reride       Ready    <none>          4h48m   v1.32.0
roughstock   Ready    <none>          4h15m   v1.32.0
steeldust    Ready    control-plane   5h7m    v1.32.0


jgn@steeldust:~$ kubectl get pods -A
NAMESPACE          NAME                                       READY   STATUS    RESTARTS   AGE
calico-apiserver   calico-apiserver-7dc9f69d88-95jrg          1/1     Running   0          4h40m
calico-apiserver   calico-apiserver-7dc9f69d88-hbl4k          1/1     Running   0          4h40m
calico-system      calico-kube-controllers-869ddb5bc5-rp4nd   1/1     Running   0          4h38m
calico-system      calico-node-gf88r                          1/1     Running   0          4h16m
calico-system      calico-node-j42kb                          1/1     Running   0          4h37m
calico-system      calico-node-kt9xs                          1/1     Running   0          4h37m
calico-system      calico-typha-6cf8b4fd74-8h52d              1/1     Running   0          4h38m
calico-system      calico-typha-6cf8b4fd74-ncpw5              1/1     Running   0          4h16m
calico-system      csi-node-driver-768ql                      2/2     Running   0          4h38m
calico-system      csi-node-driver-9t8wz                      2/2     Running   0          4h38m
calico-system      csi-node-driver-hh8x6                      2/2     Running   0          4h16m
kube-system        coredns-668d6bf9bc-7v8rr                   1/1     Running   0          5h8m
kube-system        coredns-668d6bf9bc-vz6q5                   1/1     Running   0          5h8m
kube-system        etcd-steeldust                             1/1     Running   0          5h8m
kube-system        kube-apiserver-steeldust                   1/1     Running   0          5h8m
kube-system        kube-controller-manager-steeldust          1/1     Running   0          5h8m
kube-system        kube-proxy-8556g                           1/1     Running   0          4h16m
kube-system        kube-proxy-j75jf                           1/1     Running   0          4h49m
kube-system        kube-proxy-wpwhw                           1/1     Running   0          5h8m
kube-system        kube-scheduler-steeldust                   1/1     Running   0          5h8m
tigera-operator    tigera-operator-ccfc44587-f4qs9            1/1     Running   0          4h41m

Testing with nginx

I won't know for sure if the cluster is actually working until I run a pod, see it pull and image and fire up.

kubectl run nginx --image=nginx
pod/nginx created

And it works...

kubectl get pods -n default
NAME    READY   STATUS    RESTARTS   AGE
nginx   1/1     Running   0          24s

SUCCESS!

I now have a working bare metal Kubernetes cluster with Calico providing the CNI. The basics for administering the homelab with Ansible are also in place and ready for any future playbooks that I need to write.

The cluster is hours old at the time of grabbing the output shown above, but it accurately reflects the state of the cluster after kubeadm init and the Calico installation completed.

Up next

Now that we have a Kubernetes cluster installed with Calico in our home lab and Ansible basics working we can start using it to deploy some homelab projects. Check back periodically for the next installment in the blog series.