k8s-device-plugin vs nos

k8s-device-plugin

NVIDIA device plugin for Kubernetes (by NVIDIA)

Module to Automatically maximize the utilization of GPU resources in a Kubernetes cluster through real-time dynamic partitioning and elastic quotas - Effortless optimization at its finest! (by nebuly-ai)

GPU Kubernetes Optimization

Source Code

nebuly.com

Suggest alternative

Edit details

Our great sponsors

WorkOS - The modern identity platform for B2B SaaS

InfluxDB - Power Real-Time Data Analytics at Scale

SaaSHub - Software Alternatives and Reviews

Our great sponsors

k8s-device-plugin		nos
	Project
11	Mentions	19
2,353	Stars	570
4.7%	Growth	1.9%
9.5	Activity	5.6
5 days ago	Latest Commit	4 months ago
Go	Language	Go
Apache License 2.0	License	Apache License 2.0

The number of mentions indicates the total number of mentions that we've tracked plus the number of user suggested alternatives.
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.

k8s-device-plugin

Posts with mentions or reviews of k8s-device-plugin. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2023-02-09.

Unlocking AI and ML Metal Performance with QBO Kubernetes Engine (QKE) Post
1 project | news.ycombinator.com | 5 Feb 2024

https://github.com/NVIDIA/k8s-device-plugin/issues/332#issue...
Nos – Open-Source to Maximize GPU Utilization in Kubernetes
3 projects | news.ycombinator.com | 9 Feb 2023
Show HN: Nos – Open-Source to Maximize GPU Utilization in Kubernetes
2 projects | news.ycombinator.com | 30 Jan 2023
Time-Slicing GPUs with Karpenter
3 projects | dev.to | 14 Dec 2022

K8s-device-plugin
Understanding Kubernetes Limits and Requests
9 projects | dev.to | 1 Dec 2022

This framework allows the use of external devices (e.g., NVIDIA GPUs, AMD GPUS, SR-IOV NICs) without modifying core Kubernetes components.
Nvidia GPU Plugin: Am I really limited to one pod per GPU?
1 project | /r/kubernetes | 24 Aug 2022

Not talking about MIG. NVIDIA device plugin. https://github.com/NVIDIA/k8s-device-plugin
Nvidia Kubernetes plugin install option that does not require Helm?
1 project | /r/kubernetes | 5 Jun 2022
What is the difference between nvidia device plugin and GPU operator?
2 projects | /r/kubernetes | 12 May 2022

GPU Operator Device plugin
Share a GPU between pods on AWS EKS
10 projects | dev.to | 4 Nov 2021

If you ever tried to use GPU-based instances with AWS ECS, or on EKS using the default Nvidia plugin, you would know that it's not possible to make a task/pod shared the same GPU on an instance. If you want to add more replicas to your service (for redundancy or load balancing), you would need one GPU for each replica.
Looking for a sanity check on a project I'm working on at home, hoping you fine people can help - Raspberry Pi Kubernetes Cluster
13 projects | /r/homelab | 14 Apr 2021

Some notes on Plex/Emby/Kodi and transcoding. If you want true transcoding with GPU acceleration, you have to have Nvidia GPU or be a k8s device plugin genius. The whole idea of mounting elastic devices in k8s is fairly new and rather complex. In the mean time transcoding is best done on a beefy device with a proper CPU (eg i7) or specifically Nvidia GPU because there are numerous pre-made plugins. I just run Plex and Emby on an old ATX gaming machine without GPU acceleration and it works totally fine. They were barely usable for just me when running on the RPis, wouldn't recommend it unless you can figure out how to mount the correct devices in the pod using a custom raspberry pi device plugin . . . lol good luck! - Arm labs device manager: https://community.arm.com/developer/research/b/articles/posts/a-smarter-device-manager-for-kubernetes-on-the-edge - Deis labs Akri device manager: https://github.com/deislabs/akri - Nvidia GPU plugin: https://github.com/NVIDIA/k8s-device-plugin

nos

Posts with mentions or reviews of nos. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2023-03-01.

Plug and play modules to optimize the performances of your AI systems
3 projects | news.ycombinator.com | 1 Mar 2023

Some of the available modules include:
Speedster: Automatically apply the best set of SOTA optimization techniques to achieve the maximum inference speed-up on your hardware. https://github.com/nebuly-ai/nebullvm/blob/main/apps/acceler...
Nos: Automatically maximize the utilization of GPU resources in a Kubernetes cluster through real-time dynamic partitioning and elastic quotas. https://github.com/nebuly-ai/nos
ChatLLaMA: Build faster and cheaper ChatGPT-like training process based on LLaMA architectures. https://github.com/nebuly-ai/nebullvm/tree/main/apps/acceler...
OpenAlphaTensor: Increase the computational performances of an AI model with custom-generated matrix multiplication algorithm fine-tuned for your specific hardware. https://github.com/nebuly-ai/nebullvm/tree/main/apps/acceler...
Forward-Forward: The Forward Forward algorithm is a method for training deep neural networks that replaces the backpropagation forward and backward passes with two forward passes. https://github.com/nebuly-ai/nebullvm/tree/main/apps/acceler...
Nos – Open-Source to Maximize GPU Utilization in Kubernetes
3 projects | news.ycombinator.com | 9 Feb 2023

Hi HN! I’m Michele Zanotti and today I’m releasing nos, an open-source module to efficiently run GPU workloads on Kubernetes!
Nos is meant to increase GPU utilization and cut down infrastructure and operational costs providing 2 main features:
1. Dynamic GPU Partitioning: you can think of this as a cluster autoscaler for GPUs. Instead of scaling up the number of nodes and GPUs, it dynamically partitions them into smaller “GPU slices”. This ensures that each workload only uses the GPU resources it actually needs, resulting in spare GPU capacity that could be used for other workloads. To partition GPUs, nos leverages Nvidia's MPS and MIG [1,2], finally making them dynamic.
2. Elastic Resource Quota management: it allows to increase the number of Pods running on the cluster by allowing teams (namespaces) to borrow quotas of reserved resources from other teams as long as they are not using them.
https://github.com/nebuly-ai/nos
Let me know your thoughts on the project in the comments. And don't forget to leave a star on GitHub if you like the project :)
Nos addresses some key challenges of Kubernetes tied to the fact that Kubernetes was not designed to support GPU and AI / machine learning workloads. In Kubernetes, GPUs are managed with [3] Nvidia k8s Device Plugin, which has a few major downsides. First, it requires the allocation of an integer number of GPUs per workload, not allowing workloads to request only fractions of GPU. Second, when enabling GPU shared access either with time-slicing or MIG, the device plugin advertises to Kubernetes a fixed set of GPU resources that do not dynamically adapt to the requests of the Pods at each time.
This often leads to both underutilized GPUs and pending Pods, and/or the cluster admin having to spend a lot of time looking for workarounds to make the best use of GPUs.
For example, consider a company with a k8s cluster with 20 GPUs, where 3 of these GPUs have been reserved for the data science team using Resource Quota objects. In most cases, the workloads of data scientists (notebooks, scripts, etc.) require much less memory/compute resources than those of an entire GPU, yet Kubernetes will force each container to consume an entire GPU. Also, if the team once needs to run a heavy workload, it may want to use as many resources as possible. However, the Resource Quota over their namespace would constrain the team to use at most the 3 GPUs reserved for them, even if the company cluster may be full of unused GPUs!
Instead, with nos the data science team would use nos Dynamic GPU Partitioning to request GPU slices so that many workloads can share the same GPU. Also, Elastic Resource Quotas would allow the team to consume more than the 3 reserved GPUs, borrowing quotas from other teams that are not using them. To recap, the team would be able to launch more Pods and the company would likely need fewer nodes. All this with minimal effort required by the cluster admin, who only has to set up nos.
Let me know what you think of nos, feedback would be very helpful! :) And please leave a star on GitHub if you like this opensource https://github.com/nebuly-ai/nos
Here are some other links that may be useful
- Tutorial on how to use Dynamic GPU Partitioning with Nvidia MIG https://towardsdatascience.com/dynamic-mig-partitioning-in-k...

1 project | news.ycombinator.com | 9 Feb 2023

1 project | news.ycombinator.com | 9 Feb 2023
Introducing Nos - Opensource to Maximize GPU Utilization in Kubernetes (more in the comments)
1 project | /r/kubernetes | 31 Jan 2023

1 project | /r/programming | 30 Jan 2023
Opensource to maximize GPU utilization in Kubernetes
1 project | /r/u_galaxy_dweller | 30 Jan 2023

Let me know what you think of nos, feedback would be very helpful! :) And please leave a star on GitHub if you like this opensource https://github.com/nebuly-ai/nos
New Opensource to Maximize GPU Utilization in Kubernetes
1 project | /r/opensource | 30 Jan 2023
Show HN: Nos – Open-Source to Maximize GPU Utilization in Kubernetes
2 projects | news.ycombinator.com | 30 Jan 2023
An open-source to train faster deep learning models
1 project | /r/programming | 28 Jun 2022

What are some alternatives?

When comparing k8s-device-plugin and nos you can also consider the following projects:

kubevirt-gpu-device-plugin - NVIDIA k8s device plugin for Kubevirt

gpu-operator - NVIDIA GPU Operator creates/configures/manages GPUs atop Kubernetes

harvester - Open source hyperconverged infrastructure (HCI) software

nebuly - The user analytics platform for LLMs

aws-eks-share-gpu - How to share the same GPU between pods on AWS EKS

pytorch-accelerated - A lightweight library designed to accelerate the process of training PyTorch models by providing a minimal, but extensible training loop which is flexible enough to handle the majority of use cases, and capable of utilizing different hardware options with no code changes required. Docs: https://pytorch-accelerated.readthedocs.io/en/latest/

aws-virtual-gpu-device-plugin - AWS virtual gpu device plugin provides capability to use smaller virtual gpus for your machine learning inference workloads

gosl - Linear algebra, eigenvalues, FFT, Bessel, elliptic, orthogonal polys, geometry, NURBS, numerical quadrature, 3D transfinite interpolation, random numbers, Mersenne twister, probability distributions, optimisation, differential equations.

terraform-provider-kubernetes - Terraform Kubernetes provider

metagpu - K8s device plugin for GPU sharing

containers-roadmap - This is the public roadmap for AWS container services (ECS, ECR, Fargate, and EKS).

serve - Serve, optimize and scale PyTorch models in production