Top 23 gpu-computing Open-Source Projects

catboost

8 7,731 9.9 Python

A fast, scalable, high performance Gradient Boosting on Decision Trees library, used for ranking, classification, regression and other machine learning tasks for Python, R, Java, C++. Supports computation on CPU and GPU.

Project mention: CatBoost: Open-source gradient boosting library | news.ycombinator.com | 2024-03-05
gyroflow

74 6,050 9.6 Rust

Video stabilization using gyroscope data

Project mention: Shot this using the Sony A7Cii handheld | /r/SonyAlpha | 2023-12-11

I am no videographer and only read somewhere about gyro-stabilization and https://gyroflow.xyz So maybe that's an alternative to that software. Just leaving it here.
InfluxDB

www.influxdata.com
sponsored

Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
tf-quant-finance

133 4,259 2.9 Python

High-performance TensorFlow library for quantitative finance.

Project mention: tf-quant-finance: NEW Derivatives and Hedging - star count:3911.0 | /r/algoprojects | 2023-06-10
FluidX3D

53 3,162 8.6 C++

The fastest and most memory efficient lattice Boltzmann CFD software, running on all GPUs via OpenCL.

Project mention: FluidX3D | news.ycombinator.com | 2024-03-24
Rio

15 2,914 9.9 Rust

A hardware-accelerated GPU terminal emulator focusing to run in desktops and browsers. (by raphamorim)

Project mention: Rio terminal released for MacOS, Linux, Windows and BSD | /r/programming | 2023-07-18
lingvo

1 2,781 8.7 Python

Lingvo
NyuziProcessor

9 1,898 2.7 C

GPGPU microprocessor architecture

Project mention: FuryGpu – Custom PCIe FPGA GPU | news.ycombinator.com | 2024-03-27

There's also Nyuzi which is more GPGPU focused https://github.com/jbush001/NyuziProcessor, but the author also experimented with having it do 3D graphics.
WorkOS

workos.com
sponsored

The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
SciMLBook

4 1,786 4.9 HTML

Parallel Computing and Scientific Machine Learning (SciML): Methods and Applications (MIT 18.337J/6.338J)
PyCUDA

0 1,740 5.4 Python

CUDA integration for Python, plus shiny features
dfdx

22 1,600 8.7 Rust

Deep learning in Rust, with shape checked tensors and neural networks

Project mention: Shape Typing in Python | news.ycombinator.com | 2024-04-13
Emu

3 1,590 0.0 Rust

The write-once-run-anywhere GPGPU library for Rust
kompute

37 1,480 8.3 C++

General purpose GPU compute framework built on Vulkan to support 1000s of cross vendor graphics cards (AMD, Qualcomm, NVIDIA & friends). Blazing fast, mobile-enabled, asynchronous and optimized for advanced GPU data processing usecases. Backed by the Linux Foundation.

Project mention: Intel CEO: 'The entire industry is motivated to eliminate the CUDA market' | news.ycombinator.com | 2023-12-14

The two I know of are IREE and Kompute[1]. I'm not sure how much momentum the latter has, I don't see it referenced much. There's also a growing body of work that uses Vulkan indirectly through WebGPU. This is currently lagging in performance due to lack of subgroups and cooperative matrix mult, but I see that gap closing. There I think wonnx[2] has the most momentum, but I am aware of other efforts.
[1]: https://kompute.cc/
[2]: https://github.com/webonnx/wonnx
bindsnet

1 1,417 8.7 Python

Simulation of spiking neural networks (SNNs) using PyTorch.
awesome-webgpu

1 1,337 5.8

😎 Curated list of awesome things around WebGPU ecosystem.
Arraymancer

21 1,298 8.2 Nim

A fast, ergonomic and portable tensor library in Nim with a deep learning focus for CPU, GPU and embedded devices via OpenMP, Cuda and OpenCL backends

Project mention: Arraymancer – Deep Learning Nim Library | news.ycombinator.com | 2024-03-28

It is a small DSL written using macros at https://github.com/mratsim/Arraymancer/blob/master/src/array....
Nim has pretty great meta-programming capabilities and arraymancer employs some cool features like emitting cuda-kernels on the fly using standard templates depending on backend !
MatX

7 1,112 9.1 C++

An efficient C++17 GPU numerical computing library with Python-like syntax

Project mention: An efficient C++17 GPU numerical computing library with Python-like syntax | /r/programming | 2023-10-05
TornadoVM

22 1,105 9.9 Java

TornadoVM: A practical and efficient heterogeneous programming framework for managed languages

Project mention: Intel Gaudi 3 AI Accelerator | news.ycombinator.com | 2024-04-10

You don't need to use C++ to interface with CUDA or even write it.
A while ago NVIDIA and the GraalVM team demoed grCUDA which makes it easy to share memory with CUDA kernels and invoke them from any managed language that runs on GraalVM (which includes JIT compiled Python). Because it's integrated with the compiler the invocation overhead is low:
https://developer.nvidia.com/blog/grcuda-a-polyglot-language...
And TornadoVM lets you write kernels in JVM langs that are compiled through to CUDA:
https://www.tornadovm.org
There are similar technologies for other languages/runtimes too. So I don't think that will cause NVIDIA to lose ground.
stdgpu

0 1,077 7.1 C++

stdgpu: Efficient STL-like Data Structures on the GPU
neanderthal

5 1,042 7.0 Clojure

Fast Clojure Matrix Library
AdaptiveCpp

19 1,032 9.7 C++

Implementation of SYCL and C++ standard parallelism for CPUs and GPUs from all vendors: The independent, community-driven compiler for C++-based heterogeneous programming models. Lets applications adapt themselves to all the hardware in the system - even at runtime!

Project mention: What Every Developer Should Know About GPU Computing | news.ycombinator.com | 2023-10-21

Sapphire Rapids is a CPU.
AMD's primary focus for a GPU software ecosystem these days seems to be implementing CUDA with s/cuda/hip, so AMD directly supports and encourages running GPU software written in CUDA on AMD GPUs.
The only implementation for sycl on AMD GPUs that I can find is a hobby project that apparently is not allowed to use either the 'hip' or 'sycl' names. https://github.com/AdaptiveCpp/AdaptiveCpp
accelerate

9 886 5.3 Haskell

Embedded language for high-performance array computations (by AccelerateHS)
cccl

2 737 9.8 C++

CUDA C++ Core Libraries

Project mention: GDlog: A GPU-Accelerated Deductive Engine | news.ycombinator.com | 2023-12-03

https://github.com/topics/datalog?l=rust ... Cozo, Crepe
Crepe: https://github.com/ekzhang/crepe :
> Crepe is a library that allows you to write declarative logic programs in Rust, with a Datalog-like syntax. It provides a procedural macro that generates efficient, safe code and interoperates seamlessly with Rust programs.
Looks like there's not yet a Python grammar for the treeedb tree-sitter: https://github.com/langston-barrett/treeedb :
> Generate Soufflé Datalog types, relations, and facts that represent ASTs from a variety of programming languages.
Looks like roxi supports n3, which adds `=>` "implies" to the Turtle lightweight RDF representation: https://github.com/pbonte/roxi
FWIW rdflib/owl-rl: https://owl-rl.readthedocs.io/en/latest/owlrl.html :
> simple forward chaining rules are used to extend (recursively) the incoming graph with all triples that the rule sets permit (ie, the “deductive closure” of the graph is computed).
ForwardChainingStore and BackwardChainingStore implementations w/ rdflib in Python: https://github.com/RDFLib/FuXi/issues/15
Fast CUDA hashmaps
Gdlog is built on CuCollections.
GPU HashMap libs to benchmark: Warpcore, CuCollections,
https://github.com/NVIDIA/cuCollections
https://github.com/NVIDIA/cccl
https://github.com/sleeepyjack/warpcore
/? Rocm HashMap
DeMoriarty/DOKsparse:
cuda-api-wrappers

10 726 8.8 C++

Thin C++-flavored header-only wrappers for core CUDA APIs: Runtime, Driver, NVRTC, NVTX.

Project mention: VUDA: A Vulkan Implementation of CUDA | news.ycombinator.com | 2023-07-01

1. This implements the clunky C-ish API; there's also the Modern-C++ API wrappers, with automatic error checking, RAII resource control etc.; see: https://github.com/eyalroz/cuda-api-wrappers (due disclosure: I'm the author)
2. Implementing the _runtime_ API is not the right choice; it's important to implement the _driver_ API, otherwise you can't isolate contexts, dynamically add newly-compiled JIT kernels via modules etc.
3. This is less than 3000 lines of code. Wrapping all of the core CUDA APIs (driver, runtime, NVTX, JIT compilation of CUDA-C++ and of PTX) took me > 14,000 LoC.
SaaSHub

www.saashub.com
sponsored

SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020). The latest post mention was on 2024-04-13.

gpu-computing related posts

FuryGpu – Custom PCIe FPGA GPU
16 projects | news.ycombinator.com | 27 Mar 2024
FluidX3D
1 project | news.ycombinator.com | 24 Mar 2024
Earthquake in Japan yesterday may have shifted land 1.3 meters
1 project | news.ycombinator.com | 2 Jan 2024
Shot this using the Sony A7Cii handheld
1 project | /r/SonyAlpha | 11 Dec 2023
Parallelizing WebAssembly Execution on GPUs
1 project | news.ycombinator.com | 6 Oct 2023
VectorVisor --- Accelerate (mostly) unmodified WebAssembly programs using GPUs (Made with Rust)
1 project | /r/rust | 6 Oct 2023
An efficient C++17 GPU numerical computing library with Python-like syntax
1 project | /r/programming | 5 Oct 2023
A note from our sponsor - SaaSHub
www.saashub.com | 18 Apr 2024

SaaSHub helps you find the best software and product alternatives Learn more →

Index

What are some of the best open-source gpu-computing projects? This list will help you:

	Project	Stars
1	catboost	7,731
2	gyroflow	6,050
3	tf-quant-finance	4,259
4	FluidX3D	3,162
5	Rio	2,914
6	lingvo	2,781
7	NyuziProcessor	1,898
8	SciMLBook	1,786
9	PyCUDA	1,740
10	dfdx	1,600
11	Emu	1,590
12	kompute	1,480
13	bindsnet	1,417
14	awesome-webgpu	1,337
15	Arraymancer	1,298
16	MatX	1,112
17	TornadoVM	1,105
18	stdgpu	1,077
19	neanderthal	1,042
20	AdaptiveCpp	1,032
21	accelerate	886
22	cccl	737
23	cuda-api-wrappers	726