Top 23 gpu-acceleration Open-Source Projects

tfjs

29 18,124 8.6 TypeScript

A WebGL accelerated JavaScript library for training and deploying ML models.

Project mention: JavaScript Libraries for Implementing Trendy Technologies in Web Apps in 2024 | dev.to | 2024-04-09

TensorFlow.js

TensorRT

22 9,110 5.0 C++

NVIDIA® TensorRT™ is an SDK for high-performance deep learning inference on NVIDIA GPUs. This repository contains the open source components of TensorRT.

Project mention: AMD MI300X 30% higher performance than Nvidia H100, even with optimized stack | news.ycombinator.com | 2023-12-17

> It's not rocket science to implement matrix multiplication in any GPU.
You're right, it's harder. Saying this as someone who's done more work on the former than the latter. (I have, with a team, built a rocket engine. And not your school or backyard project size, but nozzle bigger than your face kind. I've also written CUDA kernels and boy is there a big learning curve to the latter that you gotta fundamentally rethink how you view a problem. It's unquestionable why CUDA devs are paid so much. Really it's only questionable why they aren't paid more)
I know it is easy to think this problem is easy, it really looks that way. But there's an incredible amount of optimization that goes into all of this and that's what's really hard. You aren't going to get away with just N for loops for a tensor rank N. You got to chop the data up, be intelligent about it, manage memory, how you load memory, handle many data types, take into consideration different results for different FMA operations, and a whole lot more. There's a whole lot of non-obvious things that result in high optimization (maybe obvious __after__ the fact, but that's not truthfully "obvious"). The thing is, the space is so well researched and implemented that you can't get away with naive implementations, you have to be on the bleeding edge.
Then you have to do that and make it reasonably usable for the programmer too, abstracting away all of that. Cuda also has a huge head start and momentum is not a force to be reckoned with (pun intended).
Look at TensorRT[0]. The software isn't even complete and it still isn't going to cover all neural networks on all GPUs. I've had stuff work on a V100 and H100 but not an A100, then later get fixed. They even have the "Apple Advantage" in that they have control of the hardware. I'm not certain AMD will have the same advantage. We talk a lot about the difficulties of being first mover, but I think we can also recognize that momentum is an advantage of being first mover. And it isn't one to scoff at.
[0] https://github.com/NVIDIA/TensorRT

InfluxDB

www.influxdata.com featured

Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
Rio

15 2,933 9.9 Rust

A hardware-accelerated GPU terminal emulator focusing to run in desktops and browsers. (by raphamorim)

Project mention: Rio terminal released for MacOS, Linux, Windows and BSD | /r/programming | 2023-07-18

Anime4KCPP

3 1,746 2.0 C++

A high performance anime upscaler
emacs-ng

78 1,617 10.0 Emacs Lisp

A new approach to Emacs - Including TypeScript, Threading, Async I/O, and WebRender.

Project mention: Emacs-ng: A project to integrate Deno and WebRender into Emacs | news.ycombinator.com | 2023-11-17

dfdx

22 1,607 8.7 Rust

Deep learning in Rust, with shape checked tensors and neural networks

Project mention: Shape Typing in Python | news.ycombinator.com | 2024-04-13

Emu

3 1,590 0.0 Rust

The write-once-run-anywhere GPGPU library for Rust
SaaSHub

www.saashub.com featured

SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
GenerativeAIExamples

1 1,535 7.5 Python

Generative AI reference workflows optimized for accelerated infrastructure and microservice architecture.

Project mention: FLaNK Weekly 18 Dec 2023 | dev.to | 2023-12-18

TornadoVM

22 1,108 9.9 Java

TornadoVM: A practical and efficient heterogeneous programming framework for managed languages

Project mention: Intel Gaudi 3 AI Accelerator | news.ycombinator.com | 2024-04-10

You don't need to use C++ to interface with CUDA or even write it.
A while ago NVIDIA and the GraalVM team demoed grCUDA which makes it easy to share memory with CUDA kernels and invoke them from any managed language that runs on GraalVM (which includes JIT compiled Python). Because it's integrated with the compiler the invocation overhead is low:
https://developer.nvidia.com/blog/grcuda-a-polyglot-language...
And TornadoVM lets you write kernels in JVM langs that are compiled through to CUDA:
https://www.tornadovm.org
There are similar technologies for other languages/runtimes too. So I don't think that will cause NVIDIA to lose ground.

stdgpu

0 1,085 7.1 C++

stdgpu: Efficient STL-like Data Structures on the GPU
TerraForge3D

53 906 6.7 C++

Cross Platform Professional Procedural Terrain Generation & Texturing Tool
opt_einsum

1 802 0.0 Python

⚡️Optimizing einsum functions in NumPy, Tensorflow, Dask, and more with contraction order optimization.
cccl

2 771 9.8 C++

CUDA C++ Core Libraries

Project mention: GDlog: A GPU-Accelerated Deductive Engine | news.ycombinator.com | 2023-12-03

https://github.com/topics/datalog?l=rust ... Cozo, Crepe
Crepe: https://github.com/ekzhang/crepe :
> Crepe is a library that allows you to write declarative logic programs in Rust, with a Datalog-like syntax. It provides a procedural macro that generates efficient, safe code and interoperates seamlessly with Rust programs.
Looks like there's not yet a Python grammar for the treeedb tree-sitter: https://github.com/langston-barrett/treeedb :
> Generate Soufflé Datalog types, relations, and facts that represent ASTs from a variety of programming languages.
Looks like roxi supports n3, which adds `=>` "implies" to the Turtle lightweight RDF representation: https://github.com/pbonte/roxi
FWIW rdflib/owl-rl: https://owl-rl.readthedocs.io/en/latest/owlrl.html :
> simple forward chaining rules are used to extend (recursively) the incoming graph with all triples that the rule sets permit (ie, the “deductive closure” of the graph is computed).
ForwardChainingStore and BackwardChainingStore implementations w/ rdflib in Python: https://github.com/RDFLib/FuXi/issues/15
Fast CUDA hashmaps
Gdlog is built on CuCollections.
GPU HashMap libs to benchmark: Warpcore, CuCollections,
https://github.com/NVIDIA/cuCollections
https://github.com/NVIDIA/cccl
https://github.com/sleeepyjack/warpcore
/? Rocm HashMap
DeMoriarty/DOKsparse:

Cascade

12 698 3.1 C++

Node-based image editor with GPU-acceleration. (by ttddee)
PhotonCamera

4 671 7.6 Java

Android Camera that uses Enhanced image processing
DREAMPlace

2 621 7.4 C++

Deep learning toolkit-enabled VLSI placement

Project mention: A Simulated Annealing FPGA Placer in Rust | news.ycombinator.com | 2024-01-02

Yes, see "DREAMPlace: DREAMPlace: Deep Learning Toolkit-Enabled GPU Acceleration for Modern VLSI Placement".[1] The technique in particular rather reformulates VLSI placement in terms of a non-linear optimization problem. Which is how ML frameworks (broadly) work, optimizing approximations to high-dimensional non-linear functions. So it's not like, shoving the netlist it into an LLM or an existing network or anything.
Note that DREAMPlace is a global placer; it also comes with a detail placer but global placement is what it is targeted at. I don't know of an appropriate research analogue for the routing phase of the problem that follows placing, but maybe someone else does.
[1] https://github.com/limbo018/DREAMPlace

NeuralNetwork.NET

1 536 0.0 C#

A TensorFlow-inspired neural network library built from scratch in C# 7.3 for .NET Standard 2.0, with GPU support through cuDNN
VeriGPU

2 484 0.0 SystemVerilog

OpenSource GPU, in Verilog, loosely based on RISC-V ISA
MegBA

1 431 4.5 Cuda

MegBA: A GPU-Based Distributed Library for Large-Scale Bundle Adjustment
cudarc

4 400 7.3 Rust

Safe rust wrapper around CUDA toolkit

Project mention: Rust Bindgen Issue (Struct _) | /r/rust | 2023-09-05

Instead I'm trying to follow along with the structure of cudarc (https://github.com/coreylowman/cudarc) which has done bindings for other Nvidia libraries. Their methodology seems much more straight forward.

vuh

3 340 2.8 C++

Vulkan compute for people
ministark

1 323 7.7 Rust

🏃‍♂️💨 GPU accelerated STARK prover built on @arkworks-rs
Gpufit

1 300 5.1 C++

GPU-accelerated Levenberg-Marquardt curve fitting in CUDA
SaaSHub

www.saashub.com featured

SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020).

gpu-acceleration related posts

Rust Bindgen Issue (Struct _)

1 project | /r/rust | 5 Sep 2023
Rio terminal released for MacOS, Linux, Windows and BSD

1 project | /r/programming | 18 Jul 2023
Rio

1 project | /r/devopspro | 10 Jun 2023
Terminal application built with Rust and WebGPU

1 project | news.ycombinator.com | 7 Jun 2023
GPU based Terminal app running with Rust and Tokio

1 project | news.ycombinator.com | 6 Jun 2023
Terminal app built over WebGPU, WebAssembly and Rust

1 project | /r/patient_hackernews | 24 May 2023
Terminal app built over WebGPU, WebAssembly and Rust

1 project | /r/hackernews | 24 May 2023
A note from our sponsor - SaaSHub
www.saashub.com | 1 May 2024

SaaSHub helps you find the best software and product alternatives Learn more →

Index

What are some of the best open-source gpu-acceleration projects? This list will help you:

	Project	Stars
1	tfjs	18,124
2	TensorRT	9,110
3	Rio	2,933
4	Anime4KCPP	1,746
5	emacs-ng	1,617
6	dfdx	1,607
7	Emu	1,590
8	GenerativeAIExamples	1,535
9	TornadoVM	1,108
10	stdgpu	1,085
11	TerraForge3D	906
12	opt_einsum	802
13	cccl	771
14	Cascade	698
15	PhotonCamera	671
16	DREAMPlace	621
17	NeuralNetwork.NET	536
18	VeriGPU	484
19	MegBA	431
20	cudarc	400
21	vuh	340
22	ministark	323
23	Gpufit	300

gpu-acceleration

Top 23 gpu-acceleration Open-Source Projects

gpu-acceleration related posts

Rust Bindgen Issue (Struct _)

Rio terminal released for MacOS, Linux, Windows and BSD

Rio

Terminal application built with Rust and WebGPU

GPU based Terminal app running with Rust and Tokio

Terminal app built over WebGPU, WebAssembly and Rust

Terminal app built over WebGPU, WebAssembly and Rust

Index