项目作者: squat

项目描述 :
KubeCon EU 2018 talk on automating GPU infrastructure for Kubernetes on Container Linux
高级语言: HCL
项目地址: git://github.com/squat/kubeconeu2018.git
创建时间: 2018-05-02T09:37:26Z
项目社区:https://github.com/squat/kubeconeu2018

开源协议:MIT License

下载


KubeCon EU 2018

This repository contains the demo code for my KubeCon EU 2018 talk about automating GPU infrastructure for Kubernetes on Container Linux.

youtube
asciicast

Prerequisites

You will need a Google Cloud account with available quota for NVIDIA GPUs.

Getting Started

Edit the require.tf Terraform file and uncomment and add the details for your Google Cloud project:

  1. $EDITOR require.tf

Modify the provided terraform.tfvars file to suit your project:

  1. $EDITOR terraform.tfvars

Running

  1. create cluster:

    1. terraform apply --auto-approve
  2. get nodes:

    1. export KUBECONFIG="$(pwd)"/assets/auth/kubeconfig
    2. watch -n 1 kubectl get nodes
  3. create GPU manifests:

    1. kubectl apply -f manifests
  4. check status of driver installer:

    1. kubectl logs $(kubectl get pods -n kube-system | grep nvidia-driver-installer | awk '{print $1}') -c modulus -n kube-system -f
  5. check status of device plugin:

    1. kubectl logs $(kubectl get pods -n kube-system | grep nvidia-gpu-device-plugin | awk '{print $1}' | head -n1 | tail -n1) -n kube-system -f
  6. verify worker node has allocatable GPUs:

    1. kubectl describe node $(kubectl get nodes | grep worker | awk '{print $1}')
  7. let’s inspect the GPU workload:

    1. less manifests/darkapi.yaml
  8. let’s see if the GPU workload has been scheduled:

    1. watch -n 2 kubectl get pods
    2. kubectl logs $(kubectl get pods | grep darkapi | awk '{print $1}') -f
  9. for fun, let’s test the GPU workload:

    1. export INGRESS=$(terraform output | grep ingress_static_ip | awk '{print $3}')
    2. ~/code/darkapi/client http://$INGRESS/api/yolo
  10. finally, let’s clean up:

    1. terraform destroy --auto-approve

Projects Leveraged In This Demo

Component URL
Kubernetes installer https://github.com/poseidon/typhoon
GPU driver installer https://github.com/squat/modulus
Kubernetes device plugin https://github.com/kubernetes/kubernetes/blob/master/cluster/addons/device-plugins/nvidia-gpu/daemonset.yaml
sample workload https://github.com/squat/darkapi