Python Cuda

Open-source Python projects categorized as Cuda

Top 23 Python Cuda Projects

  1. vllm

    A high-throughput and memory-efficient inference and serving engine for LLMs

    Project mention: Code Review: Deep Dive into vLLM's Architecture and Implementation Analysis of OpenAI-Compatible Serving (1/2) | dev.to | 2025-06-15

    vLLM [1, 2] is a fast and easy-to-use library for LLM inference and serving. Originally developed in the Sky Computing Lab at UC Berkeley, vLLM has evolved into a community-driven project with contributions from both academia and industry. [3]

  2. InfluxDB

    InfluxDB – Built for High-Performance Time Series Workloads. InfluxDB 3 OSS is now GA. Transform, enrich, and act on time series data directly in the database. Automate critical tasks and eliminate the need to move data externally. Download now.

    InfluxDB logo
  3. sglang

    SGLang is a fast serving framework for large language models and vision language models.

    Project mention: Why DeepSeek is cheap at scale but expensive to run locally | news.ycombinator.com | 2025-06-01

    State of the art of local models is even further.

    For example, look into https://github.com/kvcache-ai/ktransformers, which achieve >11 tokens/s on a relatively old two socket Xeon servers + retail RTX 4090 GPU. Even more interesting is prefill speed at more than 250 tokens/s. This is very useful in use cases like coding, where large prompts are common.

    The above is achievable today. In the mean time Intel guys are working on something even more impressive. In https://github.com/sgl-project/sglang/pull/5150 they claim that they achieve >15 tokens/s generation and >350 tokens/s prefill. They don't share what exact hardware they run this on, but from various bits and pieces over various PRs I reverse-engineered that they use 2x Xeon 6980P with MRDIMM 8800 RAM, without GPU. Total cost of such setup will be around $10k once cheap Engineering samples hit eBay.

  4. Numba

    NumPy aware dynamic Python compiler using LLVM

    Project mention: I Don't Like NumPy | news.ycombinator.com | 2025-05-15

    Have you heard of JIT libraries like numba (https://github.com/numba/numba)? It doesn't work for all python code, but can be helpful for the type of function you gave as an example. There's no need to rewrite anything, just add a decorator to the function. I don't really know how performance compares to C, for example.

  5. cupy

    NumPy & SciPy for GPU

    Project mention: Nvidia adds native Python support to CUDA | news.ycombinator.com | 2025-04-04

    The plethora of packages, including DSLs for compute and MLIR.

    https://developer.nvidia.com/how-to-cuda-python

    https://cupy.dev/

  6. chainer

    A flexible framework of neural networks for deep learning

  7. nvitop

    An interactive NVIDIA-GPU process viewer and beyond, the one-stop solution for GPU process management.

    Project mention: nvitop VS nviwatch - a user suggested alternative | libhunt.com/r/nvitop | 2024-09-09
  8. jittor

    Jittor is a high-performance deep learning framework based on JIT compiling and meta-operators.

  9. SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

    SaaSHub logo
  10. TensorRT

    PyTorch/TorchScript/FX compiler for NVIDIA GPUs using TensorRT (by pytorch)

  11. TransformerEngine

    A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper, Ada and Blackwell GPUs, to provide better performance with lower memory utilization in both training and inference.

    Project mention: TransformerEngine: A library for accelerating Transformer models on Nvidia GPUs | news.ycombinator.com | 2024-09-25
  12. QualityScaler

    QualityScaler - image/video deeplearning upscaling for any GPU

  13. torchrec

    Pytorch domain library for recommendation systems

    Project mention: Advancements in Embedding-Based Retrieval at Pinterest Homefeed | news.ycombinator.com | 2025-02-14

    Nice, there are a ton of threads here to check out. For example I had not heard of

    https://pytorch.org/torchrec/

    Which seems to nicely package a lot of primitives I have worked with previously.

  14. ao

    PyTorch native quantization and sparsity for training and inference (by pytorch)

    Project mention: Quantized Llama models with increased speed and a reduced memory footprint | news.ycombinator.com | 2024-10-24

    You can estimate context length impact by doing back of the envelope calculations on KV cache size: 2 * layers * attention heads * head_dim * byte_per_element * batch_size * sequence_length

    Some pretty charts here https://github.com/pytorch/ao/issues/539

  15. viseron

    Self-hosted, local only NVR and AI Computer Vision software. With features such as object detection, motion detection, face recognition and more, it gives you the power to keep an eye on your home, office or any other place you want to monitor.

  16. PyCUDA

    CUDA integration for Python, plus shiny features

  17. pykeen

    🤖 A Python library for learning and evaluating knowledge graph embeddings

  18. 3d-ken-burns

    an implementation of 3D Ken Burns Effect from a Single Image using PyTorch

  19. tsdf-fusion-python

    Python code to fuse multiple RGB-D images into a TSDF voxel volume.

  20. stable-fast

    https://wavespeed.ai/ Best inference performance optimization framework for HuggingFace Diffusers on NVIDIA GPUs.

  21. pyopencl

    OpenCL integration for Python, plus shiny features

  22. curobo

    CUDA Accelerated Robot Library

  23. scikit-cuda

    Python interface to GPU-powered libraries

  24. bazel-compile-commands-extractor

    Goal: Enable awesome tooling for Bazel users of the C language family.

    Project mention: Open Source C++ Stack | dev.to | 2024-07-16

    ''' Uchen core - ML framework ''' module(name = "uchen-core", version = "0.1", compatibility_level = 1) bazel_dep(name = "abseil-cpp", version = "20240116.2") bazel_dep(name = "googletest", version = "1.14.0") git_override( module_name = "googletest", remote = "https://github.com/google/googletest.git", commit = "1d17ea141d2c11b8917d2c7d029f1c4e2b9769b2", ) bazel_dep(name = "google_benchmark", version = "1.8.3") git_override( module_name = "google_benchmark", remote = "https://github.com/google/benchmark.git", commit = "447752540c71f34d5d71046e08192db181e9b02b", ) # Dev dependencies bazel_dep(name = "hedron_compile_commands", dev_dependency = True) git_override( module_name = "hedron_compile_commands", remote = "https://github.com/hedronvision/bazel-compile-commands-extractor.git", commit = "a14ad3a64e7bf398ab48105aaa0348e032ac87f8", )

  25. caer

    High-performance Vision library in Python. Scale your research, not boilerplate.

  26. SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

    SaaSHub logo
NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020).

Python Cuda discussion

Log in or Post with

Python Cuda related posts

  • Why DeepSeek is cheap at scale but expensive to run locally

    6 projects | news.ycombinator.com | 1 Jun 2025
  • Bringing Function Calling to DeepSeek Models on SGLang

    1 project | dev.to | 23 Apr 2025
  • Docker Model Runner

    2 projects | news.ycombinator.com | 14 Apr 2025
  • A beginner's guide to the Grounding-Dino model by Adirik on Replicate

    1 project | dev.to | 12 Apr 2025
  • Advancements in Embedding-Based Retrieval at Pinterest Homefeed

    1 project | news.ycombinator.com | 14 Feb 2025
  • SGLang DeepSeek V3 Support with Collab with DeepSeek Team (Nvidia or AMD)

    1 project | news.ycombinator.com | 6 Feb 2025
  • Hunyuan3D 2.0 – High-Resolution 3D Assets Generation

    6 projects | news.ycombinator.com | 21 Jan 2025
  • A note from our sponsor - SaaSHub
    www.saashub.com | 16 Jun 2025
    SaaSHub helps you find the best software and product alternatives Learn more →

Index

What are some of the best open-source Cuda projects in Python? This list will help you:

# Project Stars
1 vllm 49,682
2 sglang 15,186
3 Numba 10,476
4 cupy 10,268
5 chainer 5,907
6 nvitop 5,630
7 jittor 3,169
8 TensorRT 2,782
9 TransformerEngine 2,473
10 QualityScaler 2,454
11 torchrec 2,236
12 ao 2,094
13 viseron 2,102
14 PyCUDA 1,951
15 pykeen 1,802
16 3d-ken-burns 1,534
17 tsdf-fusion-python 1,307
18 stable-fast 1,266
19 pyopencl 1,103
20 curobo 1,018
21 scikit-cuda 990
22 bazel-compile-commands-extractor 800
23 caer 788

Sponsored
InfluxDB – Built for High-Performance Time Series Workloads
InfluxDB 3 OSS is now GA. Transform, enrich, and act on time series data directly in the database. Automate critical tasks and eliminate the need to move data externally. Download now.
www.influxdata.com

Did you know that Python is
the 2nd most popular programming language
based on number of references?