Halide vs CUDA.jl

Our great sponsors

WorkOS - The modern identity platform for B2B SaaS

InfluxDB - Power Real-Time Data Analytics at Scale

SaaSHub - Software Alternatives and Reviews

Our great sponsors

Halide		CUDA.jl
	Project
43	Mentions	15
5,703	Stars	1,133
1.0%	Growth	3.0%
9.5	Activity	9.5
3 days ago	Latest Commit	about 22 hours ago
C++	Language	Julia
GNU General Public License v3.0 or later	License	GNU General Public License v3.0 or later

The number of mentions indicates the total number of mentions that we've tracked plus the number of user suggested alternatives.
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.

Halide

Posts with mentions or reviews of Halide. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2024-03-16.

Show HN: Flash Attention in ~100 lines of CUDA
2 projects | news.ycombinator.com | 16 Mar 2024

If CPU/GPU execution speed is the goal while simultaneously code golfing the source size, https://halide-lang.org/ might have come in handy.
Halide v17.0.0
1 project | news.ycombinator.com | 1 Feb 2024
From slow to SIMD: A Go optimization story
10 projects | news.ycombinator.com | 23 Jan 2024

This is a task where Halide https://halide-lang.org/ could really shine! It disconnects logic from scheduling (unrolling, vectorizing, tiling, caching intermediates etc), so every step the author describes in the article is a tunable in halide. halide doesn't appear to have bindings for golang so calling C++ from go might be the only viable option.
Implementing Mario's Stack Blur 15 times in C++ (with tests and benchmarks)
1 project | news.ycombinator.com | 10 Nov 2023

Probably would have been much easier to do 15 times in https://halide-lang.org/
The idea behind Halide is that scheduling memory access patterns is critical to performance. But, access patterns being interwoven into arithmetic algorithms makes them difficult to modify separately.
So, in Halide you specify the arithmetic and the schedule separately so you can rapidly iterate on either.
Making Hard Things Easy
11 projects | news.ycombinator.com | 6 Oct 2023
Deepmind Alphadev: Faster sorting algorithms discovered using deep RL
3 projects | news.ycombinator.com | 7 Jun 2023

It is not the sorting per-se which was improved here, but sorting (particularly short sequences) on modern CPUs with really the complexity being on the difficulty of predicting what will work quickly on these modern CPUs.
Doing an empirical algorithm search to find which algorithms fit well on modern CPUs/memory systems is pretty common, see e.g. FFTW, ATLAS, https://halide-lang.org/
Two-tier programming language
6 projects | /r/ProgrammingLanguages | 19 Apr 2023

Halide https://halide-lang.org/
Best book on writing an optimizing compiler (inlining, types, abstract interpretation)?
8 projects | /r/ProgrammingLanguages | 17 Apr 2023
Blog Post: Can You Trust a Compiler to Optimize Your Code?
1 project | /r/rust | 9 Apr 2023

It doesn’t apply in this case, but in general if you really want the best vectorization I would suggest using https://halide-lang.org instead of trying to coerce your compiler.
What would make you try a new language?
8 projects | /r/ProgrammingLanguages | 29 Jan 2023

If we drop the "APL" requirement, wouldn't Halide fit your criteria for the third?

CUDA.jl

Posts with mentions or reviews of CUDA.jl. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2024-01-01.

Ask HN: Best way to learn GPU programming?
2 projects | news.ycombinator.com | 1 Jan 2024

It would also mean learning Julia, but you can write GPU kernels in Julia and then compile for NVidia CUDA, AMD ROCm or IBM oneAPI.
https://juliagpu.org/
I've written CUDA kernels and I knew nothing about it going in.
What's your main programming language?
3 projects | /r/ScientificComputing | 19 Apr 2023
How is Julia Performance with GPUs (for LLMs)?
2 projects | /r/Julia | 7 Apr 2023

See https://juliagpu.org/
Yann Lecun: ML would have advanced if other lang had been adopted versus Python
9 projects | news.ycombinator.com | 22 Feb 2023

If you look at Julia open source projects you'll see that the projects tend to have a lot more contributors than the Python counterparts, even over smaller time periods. A package for defining statistical distributions has had 202 contributors (https://github.com/JuliaStats/Distributions.jl), etc. Julia Base even has had over 1,300 contributors (https://github.com/JuliaLang/julia) which is quite a lot for a core language, and that's mostly because the majority of the core is in Julia itself.
This is one of the things that was noted quite a bit at this SIAM CSE conference, that Julia development tends to have a lot more code reuse than other ecosystems like Python. For example, the various machine learning libraries like Flux.jl and Lux.jl share a lot of layer intrinsics in NNlib.jl (https://github.com/FluxML/NNlib.jl), the same GPU libraries (https://github.com/JuliaGPU/CUDA.jl), the same automatic differentiation library (https://github.com/FluxML/Zygote.jl), and of course the same JIT compiler (Julia itself). These two libraries are far enough apart that people say "Flux is to PyTorch as Lux is to JAX/flax", but while in the Python world those share almost 0 code or implementation, in the Julia world they share >90% of the core internals but have different higher levels APIs.
If one hasn't participated in this space it's a bit hard to fathom how much code reuse goes on and how that is influenced by the design of multiple dispatch. This is one of the reasons there is so much cohesion in the community since it doesn't matter if one person is an ecologist and the other is a financial engineer, you may both be contributing to the same library like Distances.jl just adding a distance function which is then used in thousands of places. With the Python ecosystem you tend to have a lot more "megapackages", PyTorch, SciPy, etc. where the barrier to entry is generally a lot higher (and sometimes requires handling the build systems, fun times). But in the Julia ecosystem you have a lot of core development happening in "small" but central libraries, like Distances.jl or Distributions.jl, which are simple enough for an undergrad to get productive in a week but is then used everywhere (Distributions.jl for example is used in every statistics package, and definitions of prior distributions for Turing.jl's probabilistic programming language, etc.).
C++ is making me depressed / CUDA question
7 projects | /r/rust | 20 Jul 2022

If you just want to do some numerical code that requires linear algebra and GPU, your best bet would be Julia or Python+JAX.
Parallélisation distribuée presque triviale d’applications GPU et CPU basées sur des Stencils avec…
7 projects | dev.to | 30 Apr 2022

GitHub - JuliaGPU/CUDA.jl: CUDA programming in Julia.
Why Fortran is easy to learn
19 projects | news.ycombinator.com | 7 Jan 2022
Generic GPU Kernels
7 projects | news.ycombinator.com | 6 Dec 2021

Should have (2017) in the title.
Indeed cool to program julia directly on the GPU and Julia on GPU and this has further evolved since then, see https://juliagpu.org/
Announcing The Rust CUDA Project; An ecosystem of crates and tools for writing and executing extremely fast GPU code fully in Rust
2 projects | /r/rust | 22 Nov 2021

I'm excited to eventually see something like JuliaGPU with support for multiple backends.
[Media] 100% Rust path tracer running on CPU, GPU (CUDA), and OptiX (for denoising) using one of my upcoming projects. There is no C/C++ code at all, the program shares a single rust crate for the core raytracer and uses rust for the viewer and renderer.
3 projects | /r/rust | 29 Oct 2021

That's really cool! Have you looked at CUDA.jl for the Julia language? Maybe you could take some ideas from there. I am pretty sure it does the same thing you do here, and they support any arbitrary code with the limitations that you cannot allocate memory, I/O is disallowed, and badly-typed code(dynamic) will not compile.

What are some alternatives?

When comparing Halide and CUDA.jl you can also consider the following projects:

taichi - Productive, portable, and performant GPU programming in Python.

LoopVectorization.jl - Macro(s) for vectorizing loops.

futhark - :boom::computer::boom: A data-parallel functional programming language

cunumeric - An Aspiring Drop-In Replacement for NumPy at Scale

Image-Convolutaion-OpenCL

awesome-quant - A curated list of insanely awesome libraries, packages and resources for Quants (Quantitative Finance)

TensorOperations.jl - Julia package for tensor contractions and related operations

cudf - cuDF - GPU DataFrame Library

triton - Development repository for the Triton language and compiler

Tullio.jl - ⅀

ponyc - Pony is an open-source, actor-model, capabilities-secure, high performance programming language

GPUCompiler.jl - Reusable compiler infrastructure for Julia GPU backends.

Halide vs taichi CUDA.jl vs LoopVectorization.jl Halide vs futhark CUDA.jl vs cunumeric Halide vs Image-Convolutaion-OpenCL CUDA.jl vs awesome-quant Halide vs TensorOperations.jl CUDA.jl vs cudf Halide vs triton CUDA.jl vs Tullio.jl Halide vs ponyc CUDA.jl vs GPUCompiler.jl

Compare Halide vs CUDA.jl and see what are their differences.

Halide

CUDA.jl

Halide

CUDA.jl

What are some alternatives?