GVM: A GPU Virtual Machine for Iommu-Capable Computers

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com

Our great sponsors
  • InfluxDB - Power Real-Time Data Analytics at Scale
  • WorkOS - The modern identity platform for B2B SaaS
  • SaaSHub - Software Alternatives and Reviews
  • linux-intel-lts

  • > Hey, OP here. Sorry I didn't get to this sooner as I posted this just before falling asleep. This comment basically hits the nail on the head in most areas. I am happy to say though that Intel now uses SR-IOV instead of GVT-g (deprecated since 9th generation Intel). Their replacement SR-IOV driver is now Open Source (recently made public):

    This would legitimately be ideal information to have in the readme/top-level link.

    > https://github.com/intel/linux-intel-lts/commit/41ef979f0894

    This is pretty unhelpful. Legitimately. It's not mainlined yet, there are zero userspace docs, etc. The patch looks like it will pretty much "just work" when/if Intel bothers to get it into mainline. Until then, a patched/forked kernel is needed.

    > Also to address the first comment in this thread - there are many inaccuracies here:

    > Post-Ampere supports MIG and SR-IOV. VFIO-Mediated Devices (Mdev) are used both pre-Ampere and post-Ampere. This is how that works:

    > https://openmdev.io/index.php/Mediated_Device_Internals

    I maintained mdev support for a major KVM-based platform, but it's been a couple of years. That said, a link to how mdev internals work isn't useful to end-users, who just want to know "how do I partition my card"? As-in "which driver/utilities do I need to install"?

    > For folks who are interested we also built LibVF.IO which enables vGPU/SR-IOV functionality on consumer GPUs:

    > https://news.ycombinator.com/item?id=28944426

    > If you're interested in a full list of supported GPUs you can read the following page from our wiki:

    > https://openmdev.io/index.php/GPU_Support

    Is there some way in which LibVF.IO differs from just being a wrapper around KVM/qemu? Because the scripts do an awful lot of stuff to your host system, and arcd.nim appears to just call qemu anyway:

  • LibVF.IO

    A vendor neutral GPU multiplexing tool driven by VFIO & YAML.

  • > Intel's Xe embedded SKUs are supported:

    > https://openmdev.io/index.php/GPU_Support

    I'm not trying to beat a dead horse here, but your matrix links to an archive of Intel's community forum, where it's a basic question, about Windows.

    https://archive.ph/0McAE#selection-4883.0-4943.35

    Then a link to Intel's LTS Linux repo. Do your scripts, if it's Xe, actually clone/build this and boot into it? What does "supported" mean in this sense? Do you replace the user's kernel with the intel-lts fork? If so, do you tell them?

    > We did some things to enable GPU virtualization on older driver revisions for most Nvidia consumer GPUs and some AMD GPUs as well. Here's that post if you'd like to take a look:

    You have some patches to drivers. Which is great. But "we did some things" on HN is best followed with "here's what we did from a technical level" or "here's the source", not "here's how to install our product".

    https://github.com/Arc-Compute/LibVF.IO/tree/master/patches

    Is there anything at all here which isn't applicable to other KVM-based virt platforms? It looks like not.

    > We're building a free/libre (AGPLv3 & GPLv2) virtualization stack intended to support i915 (Intel) and Nvidia (OpenRM).

    Again, this is great. But please be clear that "we're building a free/libre virtualization stack" is "we are building a very opinionated wrapper around qemu+KVM which is not using libvirt bindings for some reason". The "stack" definitely looks like "some utility scripts to make this easier".

    Is there a project homepage with a roadmap, goals, and issues/bugs, and so on?

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • Mdev-GPU

    A user-configurable utility for GPU vendor drivers enabling the registration of arbitrary mdev types with the VFIO-Mediated Device framework.

  • > OpenMdev.io is meant for developers, not for users.

    Frankly, it isn't meant for developers, either. Almost every page on that site is either woefully incomplete, or crib notes from docs/talks, which is fine for a high-level overview, but it's not an API reference developers can use either. The sample code is mostly just lifted from other places (such as https://github.com/torvalds/linux/blob/master/samples/vfio-m...), so useless you're better off reading the source (https://openmdev.io/index.php/OpenRM), or just links to other people's APIs which interested devs can find.

    It's fine to collate this, but it's far more like someone's personal aggregator than any kind of reference site.

    > No, it's a Libvirt alternative with convenience functions for VFIO users. Here's the documentation:

    > https://openmdev.io/index.php/LibVF.IO

    I read this before I ever wrote a reply, which you should have guessed because there was no other way to get any information. None of it tells anyone WHY this should use this instead of the bindings which have 100 developers on them, which have been battle tested for years, and for which the original author of VFIO wrote exhaustive, excellent manuals on the blog I linked earlier 7.5 years ago.

    What advantages does your system offer?

    > GVM/Mdev-GPU is unrelated to LibVF.IO which I think is where you're getting confused. LibVF.IO does not actually have any integration with GVM/Mdev-GPU so if you're reading that code you're not going to learn how GVM/Mdev-GPU works. We're planning to integrate the two but it's not done yet.

    I'm not confused either about how GVM/mdev-gpu works or about its relation to libvf.io. It's not hard to read between the missing lines of your project roadmap.

    > GVM/Mdev-GPU creates the mediated devices that are exposed in the mdevctl list. Read this code instead: https://github.com/Arc-Compute/Mdev-GPU/

    I did read that. There's nowhere else I would have said "Haskell bindings to RMAPI". I didn't call it anything else because it doesn't manage any other kind of mediated device, it's a pretty thin shim, and there's no real way to suss out what it's doing other than reading the code or the autogenerated module docs, which don't actually tell any developers where to get the values they need to populate it, which they can only get by reading other API docs (not yours), and if they're going to do that, they may as well just write their own in a language they like better.

    It's not clear from the outset what the advantage is over just submitting a PR to mdevctl to echo into /sys/devices/..../[create|remove], and overall, the README doesn't give any information about it whatsoever, even `--help` output to show the args and defaults.

    > Sure, arcd is a reference Virtual Machine Monitor as it says at the top of this page: https://openmdev.io/index.php/LibVF.IO

    No, it is not. Point blank, it is not. libvirt also isn't. Even qemu isn't for hardware virt, and you're not doing IOMMU operations on emulated CPU calls. kvm is. It's a toolkit to manage virtual machines, maybe.

    This does not answer the question at all of "why not libvirt?"

    > It's actually unrelated to GVM. You can use GVM with whatever you want, including Libvirt/Virsh/Virt-Manager because we wanted to support users of those things with GVM rather than requiring that they use LibVF.IO.

    It's unrelated... for now. And there is zero reason to use this instead of libvirt hooks which were written by and are tested by teams which already do this for libvirt (which virsh and virt-manager are just interfaces to anyway).

    Again, I'm not saying this to be critical. There are plenty of libvirt-based projects out there which would welcome a standardized tool which they could all use as an entrypoint to this, because libvirt strictly does not (and will not, even with modularity) cover this use case rather than everyone re-inventing their own tooling to handle creation. It is unlikely in the extreme that the current state of GVM will work for anyone else's use case, primarily because "give me the UUIDs of existing devices" is already handled by walking /sys, and creating a new one of a given type (or removing it on VM shutdown) in more or less the same way.

    GVM isn't built in a way which is usable by any other project. That's ok, but it does nothing to explain the design decision.

    > Well, we do create mediated devices exposed in mdevctl defined by a user config file, so I would say it goes a fair amount beyond Haskell bindings for the RMAPI. I think it's reasonable to describe a GPU mediated device as a virtual GPU given you get a virtual function that represents a scheduling share and virtual BAR space with a share of the device VRAM (partition of the GPU) which you can pass to one or several guests to allow them to run an unmodified guest GPU driver. I can't really think of a better definition for a vGPU. The Mediated Device Internals article pretty much explains the APIs GVM is dealing with - I believe we even link some sample code: https://openmdev.io/index.php/Mediated_Device_Internals

    You create mediated devices for nVidia devices by sending ioctls exposed via RMAPI. This is potato/potato. The explanation you just gave STILL makes it sound as if this is a novel thing done by GVM/mdev-gpu rather than something common, and talking down to someone who is asking informed questions about why you did it this way by linking to internals (when I was physically there for most of those talks and helped write some of the docs) doesn't paint a pretty picture.

    > Your comment seems kind of trollish so I'm not really sure what benefit continuing this thread has. I think most of the stuff you're asking about is more or less documented and spelled out as openly as we're able to. What we're trying to do here is to make this stuff more open and available to people rather than locked away behind binary blobs. More or less everything we do is put into our wiki with very few exceptions. OpenMdev.io is made to be open to our community of folks working on Mediated Device/IO Virtualization functions on various projects so if you're a developer on this stuff and think anything is lacking you're welcome to contribute or suggest it to us in our IRC or Discord. I'm sure there's always room to improve and we put a ton of effort into trying to listen to feedback and improve upon things ourselves as well as accept contributions from others

    NONE OF THE STUFF I'M ASKING ABOUT IS DOCUMENTED OR SPELLED OUT. That's the point. From someone who was a maintainer, engineering leader, etc on a major open source virtualization platform who literally wrote code which does this kind of scheduling/creation across a cluster, I am telling you that your documentation is opaque, misleading, takes credit for things you did not invent, doesn't explain your use case, doesn't explain why you re-invented the wheel, doesn't explain why there's a gaping "missing middle" between "here are kernel sources/function signatures in drivers and here's a tool" (where that "missing middle" is /sys/devices/.../mdev_supported_types[/...] and "echo|uuidgen"), etc.

    This is, or could be, a great start to a unified ecosystem. You are going to have a very hard time getting a developer/user ecosystem if you do not provide better documentation, "what these tools do", find a way to talk with other virt developers without condescending to them, present usable interfaces other projects can call which are not "here is YAML/JSON to operate on with exec()", etc, and most of all, to acknowledge the work other have done/the knowledge they have rather than presenting any of this like it's brand new or novel. It could be a great utility. Or it could be something no other project ever uses. That's up to you.

    My comments are not intended to be trollish. They are intended to tell you "as someone who has written very similar code and done very similar things for a long time, the only way to figure out what the hell any of this was supposed to do was to literally read the source and make educated guesses". The average developer/user is not going to have the knowledge base to make those guesses at all, but they may see references to "arcd ..." like it's "developer documentation", go find it, and ask "why the hell is this managing qemu directly instead of libvirt", or "why is no libvirt XML/qemu hook provided"?

    These are real problems for the project. Docs, always, for every project. I know yours is new, but these are of unusually low quality for a submission to HN, and doubling down with links to the same inadequate docs like everyone you're talking to is a moron doesn't help your reputation. Additionally, examples. And reach out to others -- proxmox, ovirt, xcp, openstack (nova). See if you can collaborate. This will mean using (or at least providing) libvirt bindings/XML snippets like everyone else. It will be worth it.

  • linux

    Linux kernel source tree

  • > OpenMdev.io is meant for developers, not for users.

    Frankly, it isn't meant for developers, either. Almost every page on that site is either woefully incomplete, or crib notes from docs/talks, which is fine for a high-level overview, but it's not an API reference developers can use either. The sample code is mostly just lifted from other places (such as https://github.com/torvalds/linux/blob/master/samples/vfio-m...), so useless you're better off reading the source (https://openmdev.io/index.php/OpenRM), or just links to other people's APIs which interested devs can find.

    It's fine to collate this, but it's far more like someone's personal aggregator than any kind of reference site.

    > No, it's a Libvirt alternative with convenience functions for VFIO users. Here's the documentation:

    > https://openmdev.io/index.php/LibVF.IO

    I read this before I ever wrote a reply, which you should have guessed because there was no other way to get any information. None of it tells anyone WHY this should use this instead of the bindings which have 100 developers on them, which have been battle tested for years, and for which the original author of VFIO wrote exhaustive, excellent manuals on the blog I linked earlier 7.5 years ago.

    What advantages does your system offer?

    > GVM/Mdev-GPU is unrelated to LibVF.IO which I think is where you're getting confused. LibVF.IO does not actually have any integration with GVM/Mdev-GPU so if you're reading that code you're not going to learn how GVM/Mdev-GPU works. We're planning to integrate the two but it's not done yet.

    I'm not confused either about how GVM/mdev-gpu works or about its relation to libvf.io. It's not hard to read between the missing lines of your project roadmap.

    > GVM/Mdev-GPU creates the mediated devices that are exposed in the mdevctl list. Read this code instead: https://github.com/Arc-Compute/Mdev-GPU/

    I did read that. There's nowhere else I would have said "Haskell bindings to RMAPI". I didn't call it anything else because it doesn't manage any other kind of mediated device, it's a pretty thin shim, and there's no real way to suss out what it's doing other than reading the code or the autogenerated module docs, which don't actually tell any developers where to get the values they need to populate it, which they can only get by reading other API docs (not yours), and if they're going to do that, they may as well just write their own in a language they like better.

    It's not clear from the outset what the advantage is over just submitting a PR to mdevctl to echo into /sys/devices/..../[create|remove], and overall, the README doesn't give any information about it whatsoever, even `--help` output to show the args and defaults.

    > Sure, arcd is a reference Virtual Machine Monitor as it says at the top of this page: https://openmdev.io/index.php/LibVF.IO

    No, it is not. Point blank, it is not. libvirt also isn't. Even qemu isn't for hardware virt, and you're not doing IOMMU operations on emulated CPU calls. kvm is. It's a toolkit to manage virtual machines, maybe.

    This does not answer the question at all of "why not libvirt?"

    > It's actually unrelated to GVM. You can use GVM with whatever you want, including Libvirt/Virsh/Virt-Manager because we wanted to support users of those things with GVM rather than requiring that they use LibVF.IO.

    It's unrelated... for now. And there is zero reason to use this instead of libvirt hooks which were written by and are tested by teams which already do this for libvirt (which virsh and virt-manager are just interfaces to anyway).

    Again, I'm not saying this to be critical. There are plenty of libvirt-based projects out there which would welcome a standardized tool which they could all use as an entrypoint to this, because libvirt strictly does not (and will not, even with modularity) cover this use case rather than everyone re-inventing their own tooling to handle creation. It is unlikely in the extreme that the current state of GVM will work for anyone else's use case, primarily because "give me the UUIDs of existing devices" is already handled by walking /sys, and creating a new one of a given type (or removing it on VM shutdown) in more or less the same way.

    GVM isn't built in a way which is usable by any other project. That's ok, but it does nothing to explain the design decision.

    > Well, we do create mediated devices exposed in mdevctl defined by a user config file, so I would say it goes a fair amount beyond Haskell bindings for the RMAPI. I think it's reasonable to describe a GPU mediated device as a virtual GPU given you get a virtual function that represents a scheduling share and virtual BAR space with a share of the device VRAM (partition of the GPU) which you can pass to one or several guests to allow them to run an unmodified guest GPU driver. I can't really think of a better definition for a vGPU. The Mediated Device Internals article pretty much explains the APIs GVM is dealing with - I believe we even link some sample code: https://openmdev.io/index.php/Mediated_Device_Internals

    You create mediated devices for nVidia devices by sending ioctls exposed via RMAPI. This is potato/potato. The explanation you just gave STILL makes it sound as if this is a novel thing done by GVM/mdev-gpu rather than something common, and talking down to someone who is asking informed questions about why you did it this way by linking to internals (when I was physically there for most of those talks and helped write some of the docs) doesn't paint a pretty picture.

    > Your comment seems kind of trollish so I'm not really sure what benefit continuing this thread has. I think most of the stuff you're asking about is more or less documented and spelled out as openly as we're able to. What we're trying to do here is to make this stuff more open and available to people rather than locked away behind binary blobs. More or less everything we do is put into our wiki with very few exceptions. OpenMdev.io is made to be open to our community of folks working on Mediated Device/IO Virtualization functions on various projects so if you're a developer on this stuff and think anything is lacking you're welcome to contribute or suggest it to us in our IRC or Discord. I'm sure there's always room to improve and we put a ton of effort into trying to listen to feedback and improve upon things ourselves as well as accept contributions from others

    NONE OF THE STUFF I'M ASKING ABOUT IS DOCUMENTED OR SPELLED OUT. That's the point. From someone who was a maintainer, engineering leader, etc on a major open source virtualization platform who literally wrote code which does this kind of scheduling/creation across a cluster, I am telling you that your documentation is opaque, misleading, takes credit for things you did not invent, doesn't explain your use case, doesn't explain why you re-invented the wheel, doesn't explain why there's a gaping "missing middle" between "here are kernel sources/function signatures in drivers and here's a tool" (where that "missing middle" is /sys/devices/.../mdev_supported_types[/...] and "echo|uuidgen"), etc.

    This is, or could be, a great start to a unified ecosystem. You are going to have a very hard time getting a developer/user ecosystem if you do not provide better documentation, "what these tools do", find a way to talk with other virt developers without condescending to them, present usable interfaces other projects can call which are not "here is YAML/JSON to operate on with exec()", etc, and most of all, to acknowledge the work other have done/the knowledge they have rather than presenting any of this like it's brand new or novel. It could be a great utility. Or it could be something no other project ever uses. That's up to you.

    My comments are not intended to be trollish. They are intended to tell you "as someone who has written very similar code and done very similar things for a long time, the only way to figure out what the hell any of this was supposed to do was to literally read the source and make educated guesses". The average developer/user is not going to have the knowledge base to make those guesses at all, but they may see references to "arcd ..." like it's "developer documentation", go find it, and ask "why the hell is this managing qemu directly instead of libvirt", or "why is no libvirt XML/qemu hook provided"?

    These are real problems for the project. Docs, always, for every project. I know yours is new, but these are of unusually low quality for a submission to HN, and doubling down with links to the same inadequate docs like everyone you're talking to is a moron doesn't help your reputation. Additionally, examples. And reach out to others -- proxmox, ovirt, xcp, openstack (nova). See if you can collaborate. This will mean using (or at least providing) libvirt bindings/XML snippets like everyone else. It will be worth it.

  • gvt-linux

  • Intel has already confirmed that GVT-g is essentially dead and not supported on their Iris/Xe or anything newer graphics.. We can also confirm this via their own drivers source..

    https://github.com/intel/gvt-linux/blob/gvt-staging/drivers/...

  • GVM-user

    GVM-user.

  • VFIO-Mdev_Samples

    Sample code for creating a VFIO Mediated Device. GPLv2 sources mirrored from elixir.bootlin.com with simple makefile changes.

  • https://github.com/OpenMdev/VFIO-Mdev_Samples

    > I'm not confused either about how GVM/mdev-gpu works or about its relation to libvf.io. It's not hard to read between the missing lines of your project roadmap.

    You define types that are presented in mdevctl via a YAML config file the user can write themselves or via a JSON file. So LibVF.IO might as well be entirely ignorant to the fact that GVM is on the system. All it sees are different arbitrary mdev types that the user decides to create. Not sure where you're going with this comment.

    >I read this before I ever wrote a reply, which you should have guessed because there was no other way to get any information. None of it tells anyone WHY this should use this instead of the bindings which have 100 developers on them, which have been battle tested for years, and for which the original author of VFIO wrote exhaustive, excellent manuals on the blog I linked earlier 7.5 years ago.

    >What advantages does your system offer?

    Well for starters I've never argued LibVF.IO is better than libvirt. In-fact when folks bring it up I generally say there's a lot we can learn from libvirt.

  • WorkOS

    The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.

    WorkOS logo
NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts