Share a GPU between pods on AWS EKS

This page summarizes the projects mentioned and recommended in the original post on dev.to

Our great sponsors
  • OPS - Build and Run Open Source Unikernels
  • SonarQube - Static code analysis for 29 languages.
  • Scout APM - Less time debugging, more time building
  • k8s-device-plugin

    NVIDIA device plugin for Kubernetes

    If you ever tried to use GPU-based instances with AWS ECS, or on EKS using the default Nvidia plugin, you would know that it's not possible to make a task/pod shared the same GPU on an instance. If you want to add more replicas to your service (for redundancy or load balancing), you would need one GPU for each replica.

  • aws-eks-share-gpu

    How to share the same GPU between pods on AWS EKS

    This project (available here) uses the k8s device plugin described by this AWS blog post to make GPU-based nodes publish the amount of GPU resource they have available. Instead of the amount of VRAM available or some abstract metric, this plugin advertises the amount of pods/processes that can be connected to the GPU. This is controlled by what is called by NVIDIA as Multi-Process Service (MPS).

  • OPS

    OPS - Build and Run Open Source Unikernels. Quickly and easily build and deploy open source unikernels in tens of seconds. Deploy in any language to any cloud.

  • aws-virtual-gpu-device-plugin

    AWS virtual gpu device plugin provides capability to use smaller virtual gpus for your machine learning inference workloads

    This project (available here) uses the k8s device plugin described by this AWS blog post to make GPU-based nodes publish the amount of GPU resource they have available. Instead of the amount of VRAM available or some abstract metric, this plugin advertises the amount of pods/processes that can be connected to the GPU. This is controlled by what is called by NVIDIA as Multi-Process Service (MPS).

  • asdf-hashicorp

    HashiCorp plugin for the asdf version manager

  • asdf-tflint

    An asdf plugin for installing terraform-linters/tflint.

  • asdf-awscli

  • aws-ami-gpu-monitoring

    This project contains the code necessary to build an AWS AMI with monitoring capabilities of GPU usage (among other metrics) using CloudWatch.

    From that repo, the only thing changed is the base AMI, which in this case an AMI tailored for accelerated hardware on EKS was used. The list of compatible AMIs for EKS can be obtained in this link updated regularly by AWS. Also, the AMI from AWS comes with SSM agent in it, so no need to change anything regarding that.

  • SonarQube

    Static code analysis for 29 languages.. Your projects are multi-language. So is SonarQube analysis. Find Bugs, Vulnerabilities, Security Hotspots, and Code Smells so you can release quality code every time. Get started analyzing your projects today for free.

  • containers-roadmap

    This is the public roadmap for AWS container services (ECS, ECR, Fargate, and EKS).

    From that repo, the only thing changed is the base AMI, which in this case an AMI tailored for accelerated hardware on EKS was used. The list of compatible AMIs for EKS can be obtained in this link updated regularly by AWS. Also, the AMI from AWS comes with SSM agent in it, so no need to change anything regarding that.

  • k2tf

    Kubernetes YAML to Terraform HCL converter

    Pro tip: If you want to convert k8s yaml files to .tf, you can use k2tf (repo) that is able to convert the resource types of the yaml top their appropriated counterparts of the k8s provider for terraform. To install it, just:

  • terraform-provider-kubernetes

    Terraform Kubernetes provider

    After the resources be provisioned, you might want to run terraform apply -refresh-only to refresh your local state as the creation of some resource change the state of others within AWS. Also, state differences on metadata.resource_version of k8s resources almost always show up after an apply. This seems to be related to this issue.

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts