LSQR-CUDA
cccl
LSQR-CUDA | cccl | |
---|---|---|
1 | 2 | |
12 | 798 | |
- | 11.3% | |
1.3 | 9.8 | |
12 months ago | 1 day ago | |
Cuda | C++ | |
MIT License | GNU General Public License v3.0 or later |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
LSQR-CUDA
-
CUDA LSQR Solver
Repository here
cccl
-
GDlog: A GPU-Accelerated Deductive Engine
https://github.com/topics/datalog?l=rust ... Cozo, Crepe
Crepe: https://github.com/ekzhang/crepe :
> Crepe is a library that allows you to write declarative logic programs in Rust, with a Datalog-like syntax. It provides a procedural macro that generates efficient, safe code and interoperates seamlessly with Rust programs.
Looks like there's not yet a Python grammar for the treeedb tree-sitter: https://github.com/langston-barrett/treeedb :
> Generate Soufflé Datalog types, relations, and facts that represent ASTs from a variety of programming languages.
Looks like roxi supports n3, which adds `=>` "implies" to the Turtle lightweight RDF representation: https://github.com/pbonte/roxi
FWIW rdflib/owl-rl: https://owl-rl.readthedocs.io/en/latest/owlrl.html :
> simple forward chaining rules are used to extend (recursively) the incoming graph with all triples that the rule sets permit (ie, the “deductive closure” of the graph is computed).
ForwardChainingStore and BackwardChainingStore implementations w/ rdflib in Python: https://github.com/RDFLib/FuXi/issues/15
Fast CUDA hashmaps
Gdlog is built on CuCollections.
GPU HashMap libs to benchmark: Warpcore, CuCollections,
https://github.com/NVIDIA/cuCollections
https://github.com/NVIDIA/cccl
https://github.com/sleeepyjack/warpcore
/? Rocm HashMap
DeMoriarty/DOKsparse:
-
Hello World on the GPU (2019)
C++20 would be news to me. Do you have a reference? The closest I can find is https://github.com/NVIDIA/cccl which seems to be atomic and bits of algorithm. E.g. can you point to unordered_map that works on the target?
I think some pieces of libc++ work but don't know of any testing or documentation effort to track what parts, nor of any explicit handling in the source tree.
What are some alternatives?
pyopencl - OpenCL integration for Python, plus shiny features
stdgpu - stdgpu: Efficient STL-like Data Structures on the GPU
oneMKL - oneAPI Math Kernel Library (oneMKL) Interfaces
cuCollections
cub - [ARCHIVED] Cooperative primitives for CUDA C++. See https://github.com/NVIDIA/cccl
DOKSparse - sparse DOK tensors on GPU, pytorch
CUDA-Guide - CUDA Guide
Taskflow - A General-purpose Parallel and Heterogeneous Task Programming System
OpenCL-Wrapper - OpenCL is the most powerful programming language ever created. Yet the OpenCL C++ bindings are cumbersome and the code overhead prevents many people from getting started. I created this lightweight OpenCL-Wrapper to greatly simplify OpenCL software development with C++ while keeping functionality and performance.
gdlog
FuXi - Chimezie Ogbuji's FuXi reasoner. NON-FUNCTIONING, RETAINED FOR ARCHIVAL PURPOSES. For working code plus version and associated support requirements see: