cudf vs CUDA.jl

Our great sponsors

WorkOS - The modern identity platform for B2B SaaS

InfluxDB - Power Real-Time Data Analytics at Scale

SaaSHub - Software Alternatives and Reviews

Our great sponsors

cudf		CUDA.jl
	Project
23	Mentions	15
7,274	Stars	1,133
2.9%	Growth	3.0%
9.9	Activity	9.5
7 days ago	Latest Commit	3 days ago
C++	Language	Julia
Apache License 2.0	License	GNU General Public License v3.0 or later

The number of mentions indicates the total number of mentions that we've tracked plus the number of user suggested alternatives.
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.

cudf

Posts with mentions or reviews of cudf. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2023-05-17.

A Polars exploration into Kedro
6 projects | dev.to | 17 May 2023

The interesting thing about Polars is that it does not try to be a drop-in replacement to pandas, like Dask, cuDF, or Modin, and instead has its own expressive API. Despite being a young project, it quickly got popular thanks to its easy installation process and its “lightning fast” performance.
Why we dropped Docker for Python environments
1 project | /r/dataengineering | 12 Apr 2023

Perhaps the largest for package size is the NVIDIA developed rapids toolkit https://rapids.ai/ . Even still adding things like pandas and some geospatial tools, you rapidly end up with an image well over a gigabyte, despite following cutting edge best practice with docker and python.
Introducing TeaScript C++ Library
2 projects | /r/cpp | 16 Feb 2023

Yes sure, that is how OpenMP does; but on the other side: you seem to already do some basic type inference, and building an AST, no? Then you know as well the size and type of your vectors, and can execute actions in parallel if there is enough data to be worth parallelizing. Is there anyone who don't want their code to execute faster if it is possible? Those that do work in big data domain do use threads and vectorized instructions without user having to type in any directive; just import different library. Example, numpy or numpy with cuda backend, or similar GPU accelerated libraries like cudf.
[D] Can we use Ray for distributed training on vertex ai ? Can someone provide me examples for the same ? Also which dataframe libraries you guys used for training machine learning models on huge datasets (100 gb+) (because pandas can't handle huge data).
1 project | /r/MachineLearning | 9 Feb 2023

Not the answer about Ray: you could use rapids.ai. I'm using it for for dataframe manipulation on GPU
Story of my life
1 project | /r/ProgrammerHumor | 28 Nov 2022

To put Data Analytics on GPU Steroids, Try RAPIDS cudf https://rapids.ai/
Artificial Intelligence in Python
1 project | /r/learnpython | 30 Oct 2022

You can scope out https://rapids.ai/. Nvidia's AI toolkits. They have some handy notebooks to poke at to get you started.
[D] [R] Large-scale clustering
2 projects | /r/MachineLearning | 27 Oct 2022

try https://rapids.ai/
[P] Looking for state of the art clustering algorithms
8 projects | /r/MachineLearning | 14 Sep 2022

As a companion to the other comments, I'd like to mention that the RAPIDS library cuML provides GPU-accelerated versions of quite a few of the algorithms mentioned in this thread (HDBSCAN, UMAP, SVM, PCA, {Exact, Approximate} Nearest Neighbors, DBSCAN, KMeans, etc.).
Integrating multiple point clouds?
3 projects | /r/learnpython | 26 Apr 2022
Buka | Sains Data GPU RAPIDS
1 project | /r/opencv | 21 Feb 2022

CUDA.jl

Posts with mentions or reviews of CUDA.jl. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2024-01-01.

Ask HN: Best way to learn GPU programming?
2 projects | news.ycombinator.com | 1 Jan 2024

It would also mean learning Julia, but you can write GPU kernels in Julia and then compile for NVidia CUDA, AMD ROCm or IBM oneAPI.
https://juliagpu.org/
I've written CUDA kernels and I knew nothing about it going in.
What's your main programming language?
3 projects | /r/ScientificComputing | 19 Apr 2023
How is Julia Performance with GPUs (for LLMs)?
2 projects | /r/Julia | 7 Apr 2023

See https://juliagpu.org/
Yann Lecun: ML would have advanced if other lang had been adopted versus Python
9 projects | news.ycombinator.com | 22 Feb 2023

If you look at Julia open source projects you'll see that the projects tend to have a lot more contributors than the Python counterparts, even over smaller time periods. A package for defining statistical distributions has had 202 contributors (https://github.com/JuliaStats/Distributions.jl), etc. Julia Base even has had over 1,300 contributors (https://github.com/JuliaLang/julia) which is quite a lot for a core language, and that's mostly because the majority of the core is in Julia itself.
This is one of the things that was noted quite a bit at this SIAM CSE conference, that Julia development tends to have a lot more code reuse than other ecosystems like Python. For example, the various machine learning libraries like Flux.jl and Lux.jl share a lot of layer intrinsics in NNlib.jl (https://github.com/FluxML/NNlib.jl), the same GPU libraries (https://github.com/JuliaGPU/CUDA.jl), the same automatic differentiation library (https://github.com/FluxML/Zygote.jl), and of course the same JIT compiler (Julia itself). These two libraries are far enough apart that people say "Flux is to PyTorch as Lux is to JAX/flax", but while in the Python world those share almost 0 code or implementation, in the Julia world they share >90% of the core internals but have different higher levels APIs.
If one hasn't participated in this space it's a bit hard to fathom how much code reuse goes on and how that is influenced by the design of multiple dispatch. This is one of the reasons there is so much cohesion in the community since it doesn't matter if one person is an ecologist and the other is a financial engineer, you may both be contributing to the same library like Distances.jl just adding a distance function which is then used in thousands of places. With the Python ecosystem you tend to have a lot more "megapackages", PyTorch, SciPy, etc. where the barrier to entry is generally a lot higher (and sometimes requires handling the build systems, fun times). But in the Julia ecosystem you have a lot of core development happening in "small" but central libraries, like Distances.jl or Distributions.jl, which are simple enough for an undergrad to get productive in a week but is then used everywhere (Distributions.jl for example is used in every statistics package, and definitions of prior distributions for Turing.jl's probabilistic programming language, etc.).
C++ is making me depressed / CUDA question
7 projects | /r/rust | 20 Jul 2022

If you just want to do some numerical code that requires linear algebra and GPU, your best bet would be Julia or Python+JAX.
Parallélisation distribuée presque triviale d’applications GPU et CPU basées sur des Stencils avec…
7 projects | dev.to | 30 Apr 2022

GitHub - JuliaGPU/CUDA.jl: CUDA programming in Julia.
Why Fortran is easy to learn
19 projects | news.ycombinator.com | 7 Jan 2022
Generic GPU Kernels
7 projects | news.ycombinator.com | 6 Dec 2021

Should have (2017) in the title.
Indeed cool to program julia directly on the GPU and Julia on GPU and this has further evolved since then, see https://juliagpu.org/
Announcing The Rust CUDA Project; An ecosystem of crates and tools for writing and executing extremely fast GPU code fully in Rust
2 projects | /r/rust | 22 Nov 2021

I'm excited to eventually see something like JuliaGPU with support for multiple backends.
[Media] 100% Rust path tracer running on CPU, GPU (CUDA), and OptiX (for denoising) using one of my upcoming projects. There is no C/C++ code at all, the program shares a single rust crate for the core raytracer and uses rust for the viewer and renderer.
3 projects | /r/rust | 29 Oct 2021

That's really cool! Have you looked at CUDA.jl for the Julia language? Maybe you could take some ideas from there. I am pretty sure it does the same thing you do here, and they support any arbitrary code with the limitations that you cannot allocate memory, I/O is disallowed, and badly-typed code(dynamic) will not compile.

What are some alternatives?

When comparing cudf and CUDA.jl you can also consider the following projects:

Numba - NumPy aware dynamic Python compiler using LLVM

LoopVectorization.jl - Macro(s) for vectorizing loops.

chia-plotter

cunumeric - An Aspiring Drop-In Replacement for NumPy at Scale

wif500 - Try to find the WIF key and get a donation 200 btc

awesome-quant - A curated list of insanely awesome libraries, packages and resources for Quants (Quantitative Finance)

Pytorch - Tensors and Dynamic neural networks in Python with strong GPU acceleration

Tullio.jl - ⅀

rmm - RAPIDS Memory Manager

GPUCompiler.jl - Reusable compiler infrastructure for Julia GPU backends.

mpire - A Python package for easy multiprocessing, but faster than multiprocessing

CudaPy - CudaPy is a runtime library that lets Python programmers access NVIDIA's CUDA parallel computation API.

cudf vs Numba CUDA.jl vs LoopVectorization.jl cudf vs chia-plotter CUDA.jl vs cunumeric cudf vs wif500 CUDA.jl vs awesome-quant cudf vs Pytorch CUDA.jl vs Tullio.jl cudf vs rmm CUDA.jl vs GPUCompiler.jl cudf vs mpire CUDA.jl vs CudaPy

Compare cudf vs CUDA.jl and see what are their differences.

cudf

CUDA.jl

cudf

CUDA.jl

What are some alternatives?