Our great sponsors
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
-
WorkOS
The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
-
CudaPy
CudaPy is a runtime library that lets Python programmers access NVIDIA's CUDA parallel computation API.
As it turns out, NVIDIA just open sourced a product called Legate which does not just GPUs but distributed as well. Right now it supports NumPy and Pandas but perhaps they'll add others in the future. Just thought this might be up your alley since it works at a higher level than the glorified CUDA in the article.
https://github.com/nv-legate/legate.numpy
Disclaimer: I work on the project they used to do the distributed execution, but otherwise have no connection with Legate.
Oh that sounds interesting. Do you know what happened to it?
I think I found it here: https://github.com/bryancatanzaro/copperhead
But I'm not sure what the state is. Looks dead (last commit 8 years ago). Probably just a proof of concept. But why hasn't this been continued?
Blog post and example:
that project might be abandoned but this strategy is used in nvidia and nvidia adjacent projects (through llvm):
https://github.com/rapidsai/cudf/blob/branch-0.20/python/cud...
https://github.com/gmarkall/numba/blob/master/numba/cuda/com...
>but we also need high level expressibility that doesn't require writing kernels in C
the above are possible because C is actually just a frontend to PTX
https://docs.nvidia.com/cuda/parallel-thread-execution/index...
fundamentally you are not going to ever be able to have a way to write cuda kernels without thinking about cuda architecture anymore so than you'll ever be able to write async code without thinking about concurrency.
that project might be abandoned but this strategy is used in nvidia and nvidia adjacent projects (through llvm):
https://github.com/rapidsai/cudf/blob/branch-0.20/python/cud...
https://github.com/gmarkall/numba/blob/master/numba/cuda/com...
>but we also need high level expressibility that doesn't require writing kernels in C
the above are possible because C is actually just a frontend to PTX
https://docs.nvidia.com/cuda/parallel-thread-execution/index...
fundamentally you are not going to ever be able to have a way to write cuda kernels without thinking about cuda architecture anymore so than you'll ever be able to write async code without thinking about concurrency.
here is writing a similar kernel in python with numba: https://github.com/ContinuumIO/gtc2017-numba/blob/master/4%2...
I think the contrast is less about the language, and more about the scope and objective of the project. the blog is describing low-level interfaces in python - probably more comparable is the old CUDAdrv.jl package (now merged into CUDA.jl): https://github.com/JuliaGPU/CUDAdrv.jl/blob/master/examples/...
here is writing a similar kernel in python with numba: https://github.com/ContinuumIO/gtc2017-numba/blob/master/4%2...
I think the contrast is less about the language, and more about the scope and objective of the project. the blog is describing low-level interfaces in python - probably more comparable is the old CUDAdrv.jl package (now merged into CUDA.jl): https://github.com/JuliaGPU/CUDAdrv.jl/blob/master/examples/...
Somewhat related, I’ve built compute shaders using wgpu-py:
https://github.com/pygfx/wgpu-py
You can define any compute shader you like in Python, with the data types, and it compiles it to SPIRV and runs it.
Here's a screenshot: https://julialang.org/assets/blog/nvvp.png. Or a recent PR when you can see NVTX ranges from Julia: https://github.com/JuliaGPU/CUDA.jl/pull/760
Closest thing to mind is Numba's cuda JIT compilation : https://numba.pydata.org/numba-doc/latest/cuda/index.html
Then you have Cupy : https://github.com/oulgen/CudaPy
But in my opinion, the most future proof solutions are higher level frameworks like Numpy, Jax and Tensorflow. TensorFlow can JIT compile Python functions to GPU (tf.function).
Sounds like nmigen might be a good open source successor to the project that you describe: https://github.com/nmigen/nmigen