I recently wanted to add a GPU node to my K3S cluster and found the documentation a little lacking so I wanted to just quickly capture how I did it, so that should I need to do it again, I can refer back to it. And if anyone else finds it useful too, then all the better.
Installing the node.
This is my fist dive into working with AI to build software. Like everyone else I have been very impressed with ChatGPT and all the other buzz around AI for the last year or so, and have used it quite a lot. What I have not done though is try to integrate it into my own apps. When I started to look at doing this I was disappointed with the performance of the GPU in my laptop due to the limited VRAM available, and I was not keen on watch the costs of using the ChatGPT API shoot up, and knew based on what I wanted to do that I would hit the rate limits quickly. Instead I decided to build a dedicated server in my home lab to experiment with running my own Large Language Models.
I began by scrounging an old desktop from a friend. It is an 8 core i7 3.6GHz with 16GB RAM, then I purchased an Nvidia RTX 4070 ti super graphics card with 16GB of VRAM. This seemed to be about the best value for money in terms of performance and available VRAM. Then I installed Ubuntu 22.04.4 and began setting it up to join the K3S cluster.
Installing the Nvidia software.
After setting up SSH the first thing I did was to install the nvidia software. This was the easy part as you can just follow the official Nvidia documentation. Start by setting up APT:
1 2 3 4 5 |
curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg \ && curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | \ sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \ sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list |
And make sure you enable the experimental features.
1 |
sed -i -e '/experimental/ s/^#//g' /etc/apt/sources.list.d/nvidia-container-toolkit.list |
Update apt and add the required packages.
1 2 3 |
sudo apt-get update && nvidia-container-toolkit nvidia-container-runtime cuda-drivers-fabricmanager-535 nvidia-headless-535-server nvidia-utils-535-server |
Setting up K3S
Next install K3S and join it to the cluster
1 |
sudo curl -sfL http://get.k3s.io | K3S_URL=https://10.1.0.21:6443 K3S_TOKEN=<token> sh -s - |
Next we need to update the containerd runtime to recognise the GPU
1 |
sudo nvidia-ctk runtime configure --runtime=containerd |
At this point it is worth verifying that you can run a container on the containerd runtime
1 2 3 |
sudo ctr image pull docker.io/nvidia/cuda:12.3.2-base-ubuntu22.04 sudo ctr run --rm --gpus 0 -t docker.io/nvidia/cuda:12.3.2-base-ubuntu22.04 cuda-12.3.2-base-ubuntu22.04 nvidia-smi |
This should prove that the container can see the GPU. Now all that is left is to configure K3S to use it so we can run our AI containers in Kubernetes. Start by creating a RuntimeClass manifest.
1 2 3 4 5 |
apiVersion: node.k8s.io/v1 kind: RuntimeClass metadata: name: nvidia handler: nvidia |
And deploy it to your cluster
1 |
kubeclt apply -f nvidia-runtimeclass.yaml |
Now we need to deploy the nvidia-device-plugin damonset. Because this is the only node in my cluster that has a GPU, I did not want the Daemonset to be deployed to all nodes, so I first added a label to my new node
1 |
kubectl label nodes rdg-clust-ai workload=ai |
And I edited the nvidia-device-plugin available from here to have an affinity matching my label, and to use my new runtime class
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 |
apiVersion: apps/v1 kind: DaemonSet metadata: name: nvidia-device-plugin-daemonset namespace: kube-system spec: selector: matchLabels: name: nvidia-device-plugin-ds updateStrategy: type: RollingUpdate template: metadata: labels: name: nvidia-device-plugin-ds spec: runtimeClassName: nvidia affinity: nodeAffinity: requiredDuringSchedulingIgnoredDuringExecution: nodeSelectorTerms: - matchExpressions: - key: workload operator: In values: - "ai" tolerations: - key: nvidia.com/gpu operator: Exists effect: NoSchedule priorityClassName: "system-node-critical" containers: - image: nvcr.io/nvidia/k8s-device-plugin:v0.14.5 name: nvidia-device-plugin-ctr env: - name: FAIL_ON_INIT_ERROR value: "false" securityContext: allowPrivilegeEscalation: false capabilities: drop: ["ALL"] volumeMounts: - name: device-plugin mountPath: /var/lib/kubelet/device-plugins volumes: - name: device-plugin hostPath: path: /var/lib/kubelet/device-plugins |
Once this is deployed and running on the new node you can verify that your done is ready by running ‘kubeclt describe node nodeName>’ and checking to make sure that ‘nvidia.com/gup’ is listed under both Capacity and Allocatable. If it is you are ready to deploy a pod that has access to your GPU.
Deploying your first GPU pod
I used the following simple test manifest to check that pods could access the GPU. Note that the runtime class is referred to again.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
apiVersion: v1 kind: Pod metadata: name: gpu-test spec: runtimeClassName: nvidia containers: - name: cuda-test image: nvidia/cuda:12.3.2-base-ubuntu22.04 command: ["nvidia-smi"] resources: limits: nvidia.com/gpu: 1 restartPolicy: OnFailure |
The pod should be correctly allocated to your new GPU node, and upon inspecting the logs you should see the output of ‘nvidia-smi’