SaaSHub helps you find the best software and product alternatives Learn more →
Cccl Alternatives
Similar projects and alternatives to cccl
-
-
InfluxDB
InfluxDB – Built for High-Performance Time Series Workloads. InfluxDB 3 OSS is now GA. Transform, enrich, and act on time series data directly in the database. Automate critical tasks and eliminate the need to move data externally. Download now.
-
-
FuXi
Chimezie Ogbuji's FuXi reasoner. NON-FUNCTIONING, RETAINED FOR ARCHIVAL PURPOSES. For working code plus version and associated support requirements see:
-
-
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
-
xsimd
C++ wrappers for SIMD intrinsics and parallelized, optimized mathematical functions (SSE, AVX, AVX512, NEON, SVE))
-
OpenCL-Wrapper
OpenCL is the most powerful programming language ever created. Yet the OpenCL C++ bindings are cumbersome and the code overhead prevents many people from getting started. I created this lightweight OpenCL-Wrapper to greatly simplify OpenCL software development with C++ while keeping functionality and performance.
-
treeedb
Generate Soufflé Datalog types, relations, and facts that represent ASTs from a variety of programming languages.
-
-
-
-
-
GPUSorting
State of the art sorting and segmented sorting, including OneSweep. Implemented in CUDA, D3D12, and Unity style compute shaders. Theoretically portable to all wave/warp/subgroup sizes.
-
-
-
virtuoso-opensource
Virtuoso is a high-performance and scalable Multi-Model RDBMS, Data Integration Middleware, Linked Data Deployment, and HTTP Application Server Platform
-
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
cccl discussion
cccl reviews and mentions
-
Learning Assembly for Fun, Performance and Profit
So I would say skill at GPU assembly is in-demand for the elite tier of GPU performance work. Not necessarily writing much of it (though see [1] for an example, this is the kernel of multisplit as used in Nvidia's Onesweep implementation), but definitely in being able to read it so you can understand what the compiled code is actually doing. I'll also cite as evidence of that the incredible work of the engineers on Nanite. They describe writing the core of the microtriangle software renderer in HLSL but analyzing the assembler output to optimize down to the cycle level, as described in their "deep dive into Nanite virtualized geometry" talk (timestamp points to the reference to instruction-level micro-optimization).
[1]: https://github.com/NVIDIA/cccl/blob/2d1fa6bc9235106740d9373c...
[2]: https://www.youtube.com/watch?v=eviSykqSUUw&t=2073s
- Sorting Algorithm with CUDA
- NVIDIA Transitions Fully Towards Open-Source GPU Kernel Modules
-
GDlog: A GPU-Accelerated Deductive Engine
https://github.com/topics/datalog?l=rust ... Cozo, Crepe
Crepe: https://github.com/ekzhang/crepe :
> Crepe is a library that allows you to write declarative logic programs in Rust, with a Datalog-like syntax. It provides a procedural macro that generates efficient, safe code and interoperates seamlessly with Rust programs.
Looks like there's not yet a Python grammar for the treeedb tree-sitter: https://github.com/langston-barrett/treeedb :
> Generate Soufflé Datalog types, relations, and facts that represent ASTs from a variety of programming languages.
Looks like roxi supports n3, which adds `=>` "implies" to the Turtle lightweight RDF representation: https://github.com/pbonte/roxi
FWIW rdflib/owl-rl: https://owl-rl.readthedocs.io/en/latest/owlrl.html :
> simple forward chaining rules are used to extend (recursively) the incoming graph with all triples that the rule sets permit (ie, the “deductive closure” of the graph is computed).
ForwardChainingStore and BackwardChainingStore implementations w/ rdflib in Python: https://github.com/RDFLib/FuXi/issues/15
Fast CUDA hashmaps
Gdlog is built on CuCollections.
GPU HashMap libs to benchmark: Warpcore, CuCollections,
https://github.com/NVIDIA/cuCollections
https://github.com/NVIDIA/cccl
https://github.com/sleeepyjack/warpcore
/? Rocm HashMap
DeMoriarty/DOKsparse:
-
Hello World on the GPU (2019)
C++20 would be news to me. Do you have a reference? The closest I can find is https://github.com/NVIDIA/cccl which seems to be atomic and bits of algorithm. E.g. can you point to unordered_map that works on the target?
I think some pieces of libc++ work but don't know of any testing or documentation effort to track what parts, nor of any explicit handling in the source tree.
-
A note from our sponsor - SaaSHub
www.saashub.com | 13 May 2025
Stats
NVIDIA/cccl is an open source project licensed under GNU General Public License v3.0 or later which is an OSI approved license.
The primary programming language of cccl is C++.