Unifying the CUDA Python Ecosystem

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com

Our great sponsors
  • InfluxDB - Power Real-Time Data Analytics at Scale
  • WorkOS - The modern identity platform for B2B SaaS
  • SaaSHub - Software Alternatives and Reviews
  • cunumeric

    An Aspiring Drop-In Replacement for NumPy at Scale

  • As it turns out, NVIDIA just open sourced a product called Legate which does not just GPUs but distributed as well. Right now it supports NumPy and Pandas but perhaps they'll add others in the future. Just thought this might be up your alley since it works at a higher level than the glorified CUDA in the article.

    https://github.com/nv-legate/legate.numpy

    Disclaimer: I work on the project they used to do the distributed execution, but otherwise have no connection with Legate.

  • grcuda

    Polyglot CUDA integration for the GraalVM

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • copperhead

    Data Parallel Python

  • Oh that sounds interesting. Do you know what happened to it?

    I think I found it here: https://github.com/bryancatanzaro/copperhead

    But I'm not sure what the state is. Looks dead (last commit 8 years ago). Probably just a proof of concept. But why hasn't this been continued?

    Blog post and example:

  • cudf

    cuDF - GPU DataFrame Library

  • that project might be abandoned but this strategy is used in nvidia and nvidia adjacent projects (through llvm):

    https://github.com/rapidsai/cudf/blob/branch-0.20/python/cud...

    https://github.com/gmarkall/numba/blob/master/numba/cuda/com...

    >but we also need high level expressibility that doesn't require writing kernels in C

    the above are possible because C is actually just a frontend to PTX

    https://docs.nvidia.com/cuda/parallel-thread-execution/index...

    fundamentally you are not going to ever be able to have a way to write cuda kernels without thinking about cuda architecture anymore so than you'll ever be able to write async code without thinking about concurrency.

  • numba

    NumPy aware dynamic Python compiler using LLVM (by gmarkall)

  • that project might be abandoned but this strategy is used in nvidia and nvidia adjacent projects (through llvm):

    https://github.com/rapidsai/cudf/blob/branch-0.20/python/cud...

    https://github.com/gmarkall/numba/blob/master/numba/cuda/com...

    >but we also need high level expressibility that doesn't require writing kernels in C

    the above are possible because C is actually just a frontend to PTX

    https://docs.nvidia.com/cuda/parallel-thread-execution/index...

    fundamentally you are not going to ever be able to have a way to write cuda kernels without thinking about cuda architecture anymore so than you'll ever be able to write async code without thinking about concurrency.

  • gtc2017-numba

    Numba tutorial for GTC 2017 conference

  • here is writing a similar kernel in python with numba: https://github.com/ContinuumIO/gtc2017-numba/blob/master/4%2...

    I think the contrast is less about the language, and more about the scope and objective of the project. the blog is describing low-level interfaces in python - probably more comparable is the old CUDAdrv.jl package (now merged into CUDA.jl): https://github.com/JuliaGPU/CUDAdrv.jl/blob/master/examples/...

  • CUDAdrv.jl

    Discontinued A Julia wrapper for the CUDA driver API.

  • here is writing a similar kernel in python with numba: https://github.com/ContinuumIO/gtc2017-numba/blob/master/4%2...

    I think the contrast is less about the language, and more about the scope and objective of the project. the blog is describing low-level interfaces in python - probably more comparable is the old CUDAdrv.jl package (now merged into CUDA.jl): https://github.com/JuliaGPU/CUDAdrv.jl/blob/master/examples/...

  • WorkOS

    The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.

    WorkOS logo
  • wgpu-py

    Next generation GPU API for Python

  • Somewhat related, I’ve built compute shaders using wgpu-py:

    https://github.com/pygfx/wgpu-py

    You can define any compute shader you like in Python, with the data types, and it compiles it to SPIRV and runs it.

  • CUDA.jl

    CUDA programming in Julia.

  • Here's a screenshot: https://julialang.org/assets/blog/nvvp.png. Or a recent PR when you can see NVTX ranges from Julia: https://github.com/JuliaGPU/CUDA.jl/pull/760

  • CudaPy

    CudaPy is a runtime library that lets Python programmers access NVIDIA's CUDA parallel computation API.

  • Closest thing to mind is Numba's cuda JIT compilation : https://numba.pydata.org/numba-doc/latest/cuda/index.html

    Then you have Cupy : https://github.com/oulgen/CudaPy

    But in my opinion, the most future proof solutions are higher level frameworks like Numpy, Jax and Tensorflow. TensorFlow can JIT compile Python functions to GPU (tf.function).

  • amaranth

    A modern hardware definition language and toolchain based on Python

  • Sounds like nmigen might be a good open source successor to the project that you describe: https://github.com/nmigen/nmigen

  • Numba

    NumPy aware dynamic Python compiler using LLVM

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts