C++ Nvidia

Open-source C++ projects categorized as Nvidia

Top 23 C++ Nvidium Projects

  1. moonlight-qt

    GameStream client for PCs (Windows, Mac, Linux, and Steam Link)

  2. SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

    SaaSHub logo
  3. TensorRT

    NVIDIA® TensorRT™ is an SDK for high-performance deep learning inference on NVIDIA GPUs. This repository contains the open source components of TensorRT.

  4. cutlass

    CUDA Templates and Python DSLs for High-Performance Linear Algebra

    Project mention: Dell's version of the DGX Spark fixes pain points | news.ycombinator.com | 2026-01-01

    I'm telling your it works now. It's just not called `tcgen05`.

    Put this in nsight compute: https://github.com/NVIDIA/cutlass/blob/main/examples/79_blac...

    (I said 83, it's 79).

    If you want to know what NVIDIA really thinks, watch this repo: https://github.com/nVIDIA/fuser. The Polyhedral Wizards at play. All the big not-quite-Fields players are splashing around there. I'm doing lean4 proofs of a bunch of their stuff. https://v0-straylight-papers-touchups.vercel.app

    It works now. It's just not the PTX mnemonic that you want to see.

  5. jetson-inference

    Hello AI World guide to deploying deep-learning inference networks and deep vision primitives with TensorRT and NVIDIA Jetson.

  6. OptiScaler

    OptiScaler bridges upscaling/frame gen across GPUs. Supports DLSS2+/XeSS/FSR2+ inputs, replaces native upscalers, enables FSR3 FG on non-FG titles. Supports Nukem mod for DLSSG-to-FSR3 FG.

    Project mention: FP8 is ~100 tflops faster when the kernel name has "cutlass" in it | news.ycombinator.com | 2025-07-11

    > My understanding was it was optimized by reducing precision or something to a visibly apparent degree.

    If only we had that sort of a control over rendering for every game ourselves - since projects like OptiScaler at least let us claw back control over sometimes proprietary upscaling and even framegen, but it's not quite enough: https://github.com/optiscaler/OptiScaler

    I want to be able to freely toggle between different types of AA and SSAO and reflections and lighting and LOD systems and various shader effects (especially things like chromatic aberration or motion blur) and ray tracing and all that, instead of having to hope that the console port that's offered to me has those abilities in the graphics menu and that whoever is making the decisions hasn't decided that actually "low" graphics (that would at least run smoothly) would look too bad for the game's brand image or something.

  7. NCCL

    Optimized primitives for collective multi-GPU communication

    Project mention: RustCC: Bringing Rust-Style Safety to C++17 via Policy Enforcement | news.ycombinator.com | 2026-03-20

    You have solid points.

    RustCC only humbly borrows Rust abstractions into a CC profiler. If that helps the existing or new CC code, which is already good. Personally, I use Rust too. That is the inspiration of RustCC.

    BTW, RustCC may have identified an NCCL potentially UB bug. Waiting for NCCL to review.

    https://github.com/NVIDIA/nccl/issues/2062

  8. waifu2x-ncnn-vulkan

    waifu2x converter ncnn version, runs fast on intel / amd / nvidia / apple-silicon GPU with vulkan

  9. onnx-tensorrt

    ONNX-TensorRT: TensorRT backend for ONNX

  10. CV-CUDA

    CV-CUDA™ is an open-source, GPU accelerated library for cloud-scale image processing and computer vision.

  11. cccl

    CUDA Core Compute Libraries

    Project mention: Delivering the Missing Building Blocks for Nvidia CUDA Kernel Fusion in Python | news.ycombinator.com | 2025-07-16

    There’s an extensive change-log supporting the CCCL 3.0 release on GitHub from 3 hours ago: https://github.com/NVIDIA/cccl/releases/tag/v3.0.0

  12. uccl

    UCCL is an efficient communication library for GPUs, covering collectives, P2P (e.g., KV cache transfer, RL weight transfer), and EP (e.g., GPU-driven)

    Project mention: UCCL: An Extensible Software Transport Layer for GPU Networking | news.ycombinator.com | 2025-06-28
  13. gdrcopy

    A fast GPU memory copy library based on NVIDIA GPUDirect RDMA technology

    Project mention: Gdrcopy: Fast CPU-GPU memory copy library based on Nvidia GPUDirect RDMA | news.ycombinator.com | 2025-11-09
  14. RaspberryPi-WebRTC

    Native WebRTC low-latency P2P video streaming on Raspberry Pi and NVIDIA Jetson with both hardware and software encoding support.

  15. cuda-api-wrappers

    Thin C++-flavored header-only wrappers for core CUDA APIs: Runtime, Driver, NVRTC, NVTX.

    Project mention: CUDA Ontology | news.ycombinator.com | 2025-11-20

    > CUDA Runtime: The runtime library (libcudart) that applications link against.

    That library is actually a rather poor idea. If you're writing a CUDA application, I strongly recommend avoiding the "runtime API". It provides partial access to the actual CUDA driver and its API, which is 'simpler' in the sense that you don't explicitly create "contexts", but:

    * It hides or limits a lot of the functionality.

    * Its actual behavior vis-a-vis contexts is not at all simple and is likely to make your life more difficult down the road.

    * It's not some clean interface that's much more convenient to use.

    So, either go with the driver, or consider my CUDA API wrappers library [1], which _does_ offer a clean, unified, modern (well, C++11'ish) RAII/CADRe interface. And it covers much more than the runtime API, to boot: JIT compilation of CUDA (nvrtc) and PTX (nvptx_compiler), profiling (nvtx), etc.

    [1] : https://github.com/eyalroz/cuda-api-wrappers/

  16. isaac_ros_nvblox

    NVIDIA-accelerated 3D scene reconstruction and Nav2 local costmap provider using nvblox

  17. dxvk-nvapi

    Alternative NVAPI implementation on top of DXVK.

  18. relion

    Image-processing software for cryo-electron microscopy

  19. yolov5-deepsort-tensorrt

    A c++ implementation of yolov5 and deepsort

  20. deko3d

    Homebrew low level graphics API for Nintendo Switch (Nvidia Tegra X1)

  21. isaac_ros_common

    Common utilities, packages, scripts, and testing infrastructure for Isaac ROS packages.

  22. parakeet.cpp

    Ultra fast and portable Parakeet implementation for on-device inference in C++ using Axiom with MPS+Unified Memory (by Frikallo)

    Project mention: Nvidia PersonaPlex 7B on Apple Silicon: Full-Duplex Speech-to-Speech in Swift | news.ycombinator.com | 2026-03-05

    I really like this, and have actually tried (unsuccessfully) to get PersonaPlex to run on my blackwell device - I will try this on Mac now as well.

    There are a few caveats here, for those of you venturing in this, since I've spent considerable time looking at these voice agents. First is that a VAD->ASR->LLM->TTS pipeline can still feel real-time with sub-second RTT. For example, see my project https://github.com/acatovic/ova and also a few others here on HN (e.g. https://www.ntik.me/posts/voice-agent and https://github.com/Frikallo/parakeet.cpp).

    Another aspect, after talking to peeps on PersonaPlex, is that this full duplex architecture is still a bit off in terms of giving you good accuracy/performance, and it's quite diffiult to train. On the other hand ASR->LLM->TTS gives you a composable pipeline where you can swap parts out and have a mixture of tiny and large LLMs, as well as local and API based endpoints.

  23. optimus-manager-qt

    An interface for Optimus Manager that allows to switch GPUs on Optimus laptops.

  24. isaac_ros_apriltag

    NVIDIA-accelerated Apriltag detection and pose estimation.

NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020).

C++ Nvidia discussion

Log in or Post with

C++ Nvidia related posts

Index

What are some of the best open-source Nvidium projects in C++? This list will help you:

# Project Stars
1 moonlight-qt 17,435
2 TensorRT 13,038
3 cutlass 9,838
4 jetson-inference 8,874
5 OptiScaler 8,602
6 NCCL 4,785
7 waifu2x-ncnn-vulkan 3,411
8 onnx-tensorrt 3,206
9 CV-CUDA 2,693
10 cccl 2,367
11 uccl 1,398
12 gdrcopy 1,380
13 RaspberryPi-WebRTC 976
14 cuda-api-wrappers 890
15 isaac_ros_nvblox 704
16 dxvk-nvapi 622
17 relion 537
18 yolov5-deepsort-tensorrt 493
19 deko3d 393
20 isaac_ros_common 305
21 parakeet.cpp 280
22 optimus-manager-qt 243
23 isaac_ros_apriltag 186

Sponsored
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com

Did you know that C++ is
the 7th most popular programming language
based on number of references?