KubeCon EU 2018 talk on automating GPU infrastructure for Kubernetes on Container Linux
This repository contains the demo code for my KubeCon EU 2018 talk about automating GPU infrastructure for Kubernetes on Container Linux.
You will need a Google Cloud account with available quota for NVIDIA GPUs.
Edit the require.tf
Terraform file and uncomment and add the details for your Google Cloud project:
$EDITOR require.tf
Modify the provided terraform.tfvars
file to suit your project:
$EDITOR terraform.tfvars
create cluster:
terraform apply --auto-approve
get nodes:
export KUBECONFIG="$(pwd)"/assets/auth/kubeconfig
watch -n 1 kubectl get nodes
create GPU manifests:
kubectl apply -f manifests
check status of driver installer:
kubectl logs $(kubectl get pods -n kube-system | grep nvidia-driver-installer | awk '{print $1}') -c modulus -n kube-system -f
check status of device plugin:
kubectl logs $(kubectl get pods -n kube-system | grep nvidia-gpu-device-plugin | awk '{print $1}' | head -n1 | tail -n1) -n kube-system -f
verify worker node has allocatable GPUs:
kubectl describe node $(kubectl get nodes | grep worker | awk '{print $1}')
let’s inspect the GPU workload:
less manifests/darkapi.yaml
let’s see if the GPU workload has been scheduled:
watch -n 2 kubectl get pods
kubectl logs $(kubectl get pods | grep darkapi | awk '{print $1}') -f
for fun, let’s test the GPU workload:
export INGRESS=$(terraform output | grep ingress_static_ip | awk '{print $3}')
~/code/darkapi/client http://$INGRESS/api/yolo
finally, let’s clean up:
terraform destroy --auto-approve
Component | URL |
---|---|
Kubernetes installer | https://github.com/poseidon/typhoon |
GPU driver installer | https://github.com/squat/modulus |
Kubernetes device plugin | https://github.com/kubernetes/kubernetes/blob/master/cluster/addons/device-plugins/nvidia-gpu/daemonset.yaml |
sample workload | https://github.com/squat/darkapi |