crepe
highway
crepe | highway | |
---|---|---|
4 | 77 | |
474 | 4,723 | |
1.3% | 1.7% | |
0.0 | 9.6 | |
over 1 year ago | 1 day ago | |
Rust | C++ | |
Apache License 2.0 | Apache License 2.0 |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
crepe
- Datalog in 100 lines of JavaScript (2022)
-
GDlog: A GPU-Accelerated Deductive Engine
https://github.com/topics/datalog?l=rust ... Cozo, Crepe
Crepe: https://github.com/ekzhang/crepe :
> Crepe is a library that allows you to write declarative logic programs in Rust, with a Datalog-like syntax. It provides a procedural macro that generates efficient, safe code and interoperates seamlessly with Rust programs.
Looks like there's not yet a Python grammar for the treeedb tree-sitter: https://github.com/langston-barrett/treeedb :
> Generate Soufflé Datalog types, relations, and facts that represent ASTs from a variety of programming languages.
Looks like roxi supports n3, which adds `=>` "implies" to the Turtle lightweight RDF representation: https://github.com/pbonte/roxi
FWIW rdflib/owl-rl: https://owl-rl.readthedocs.io/en/latest/owlrl.html :
> simple forward chaining rules are used to extend (recursively) the incoming graph with all triples that the rule sets permit (ie, the “deductive closure” of the graph is computed).
ForwardChainingStore and BackwardChainingStore implementations w/ rdflib in Python: https://github.com/RDFLib/FuXi/issues/15
Fast CUDA hashmaps
Gdlog is built on CuCollections.
GPU HashMap libs to benchmark: Warpcore, CuCollections,
https://github.com/NVIDIA/cuCollections
https://github.com/NVIDIA/cccl
https://github.com/sleeepyjack/warpcore
/? Rocm HashMap
DeMoriarty/DOKsparse:
-
Ergonomic inline SQL as a Python library
Inspired by past work: LINQ, inline-python, crepe, DataScript, Riffle.
- Call for Help - Open Source Datom/EAV/Fact database in Rust.
highway
-
Three Fundamental Flaws of SIMD
I quite like highway.
As mentioned, last time I tried vqsort for RVV it was surprisingly slow.
I tried to replicate it yesterday, but noticed that vqsort is now disabled for RVV: https://github.com/google/highway/blob/400fbf20f2e40b984be12...
Does highway support sorting networks for non-128-bit vector registers?
When I tried to compile it for AVX512, the BaseCase seems to only use xmm registers: https://godbolt.org/z/qr9xoTGKn
-
Towards fearless SIMD, 7 years later
I'm not proficient in Rust, but API wise I'd conceptually define types like f32xn or f32s, which have the number of elements that fit into a vector register for your target architecture, so 4 for NEON/SSE, 8 for AVX and 16 for AVX512.
I can recommens lookig at the highway library: https://github.com/google/highway
-
FFmpeg School of Assembly Language
What about Highway? https://github.com/google/highway I suppose that's C++ not C though.
- C Is Not Suited to SIMD
-
Static search trees: 40x faster than binary search
google has a mature C++ library for portable SIMD. The original article seems to be a translation of the excellent algorithmica site which had it in C++.
https://github.com/google/highway
-
Why those particular integer multiplies?
> Or do they somehow adapt to the operations supported by the user's CPU?
This is called runtime dispatch. You can do it manually or use a library, like Google Highway. GCC supports multiversioning where you write separate versions of a function and the right one is selected at runtime.
https://github.com/google/highway
https://gcc.gnu.org/onlinedocs/gcc-9.1.0/gcc/Function-Multiv...
- Highway – Portable SIMD Library
-
Open Source C++ Stack
GitHub
- Highway: C++ library that provides portable SIMD/vector intrinsics
-
Apple's M4 Has Reportedly Adopted the ARMv9 Architecture
It's great you bring up cmp, helps to understand why 4x128 is not necessarily as good as 1x512. Quicksort, hardly a 'weird kernel', does comparisons followed by compaction. Because comparisons return a predicate, which have only a single write port, we can only do 128 bits of comparisons per cycle. Ouch.
However, masking can still help our VQSort [1], for example when writing the rightmost partition right to left without stomping on subsequent elements, or in a sorting network, only updating every second element.
[1] https://github.com/google/highway/tree/master/hwy/contrib/so...
What are some alternatives?
percival - 📝 Web-based, reactive Datalog notebooks for data analysis and visualization
xsimd - C++ wrappers for SIMD intrinsics and parallelized, optimized mathematical functions (SSE, AVX, AVX512, NEON, SVE))
souffle - Soufflé is a variant of Datalog for tool designers crafting analyses in Horn clauses. Soufflé synthesizes a native parallel C++ program from a logic specification.
Vc - SIMD Vector Classes for C++
ascent - Logic programming in Rust
DirectXMath - DirectXMath is an all inline SIMD C++ linear algebra library for use in games and graphics apps