Halide
rayon
Our great sponsors
Halide | rayon | |
---|---|---|
43 | 67 | |
5,703 | 10,242 | |
1.0% | 2.9% | |
9.5 | 9.0 | |
4 days ago | 4 days ago | |
C++ | Rust | |
GNU General Public License v3.0 or later | Apache License 2.0 |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
Halide
-
Show HN: Flash Attention in ~100 lines of CUDA
If CPU/GPU execution speed is the goal while simultaneously code golfing the source size, https://halide-lang.org/ might have come in handy.
- Halide v17.0.0
-
From slow to SIMD: A Go optimization story
This is a task where Halide https://halide-lang.org/ could really shine! It disconnects logic from scheduling (unrolling, vectorizing, tiling, caching intermediates etc), so every step the author describes in the article is a tunable in halide. halide doesn't appear to have bindings for golang so calling C++ from go might be the only viable option.
-
Implementing Mario's Stack Blur 15 times in C++ (with tests and benchmarks)
Probably would have been much easier to do 15 times in https://halide-lang.org/
The idea behind Halide is that scheduling memory access patterns is critical to performance. But, access patterns being interwoven into arithmetic algorithms makes them difficult to modify separately.
So, in Halide you specify the arithmetic and the schedule separately so you can rapidly iterate on either.
- Making Hard Things Easy
-
Deepmind Alphadev: Faster sorting algorithms discovered using deep RL
It is not the sorting per-se which was improved here, but sorting (particularly short sequences) on modern CPUs with really the complexity being on the difficulty of predicting what will work quickly on these modern CPUs.
Doing an empirical algorithm search to find which algorithms fit well on modern CPUs/memory systems is pretty common, see e.g. FFTW, ATLAS, https://halide-lang.org/
-
Two-tier programming language
Halide https://halide-lang.org/
- Best book on writing an optimizing compiler (inlining, types, abstract interpretation)?
-
Blog Post: Can You Trust a Compiler to Optimize Your Code?
It doesn’t apply in this case, but in general if you really want the best vectorization I would suggest using https://halide-lang.org instead of trying to coerce your compiler.
-
What would make you try a new language?
If we drop the "APL" requirement, wouldn't Halide fit your criteria for the third?
rayon
- Rayon: Data-race free parallelization of sequential computations in Rust
- Too Dangerous for C++
-
Which application/problem would you choose for presenting Rust to newcomers in 1h30min?
Do some operations with .iter() then later use rayon to parallelize. So you can show how easy is to add a dependency and how easy is to parallelize.
-
What Are The Rust Crates You Use In Almost Every Project That They Are Practically An Extension of The Standard Library?
rayon: Async CPU runtime for parallelism.
-
Moving from Typescript and Langchain to Rust and Loops
In the quest for more efficient solutions, the ONNX runtime emerged as a beacon of performance. The decision to transition from Typescript to Rust was an unconventional yet pivotal one. Driven by Rust's robust parallel processing capabilities using Rayon and seamless integration with ONNX through the ort crate, Repo-Query unlocked a realm of unparalleled efficiency. The result? A transformation from sluggish processing to, I have to say it, blazing-fast performance.
-
AreWeMegafactoryYet? I just breached simulating 1M buildings @ 60 fps (If I'm not recording, Ryzen 7 1700X 8 Core)
With a lot of rayon, blood, sweat and tears I finally managed to simulate a million buildings at 60fps :) Feel free to AMA, game is Combine And Conquer
-
The Rust I Wanted Had No Future
(see https://github.com/rayon-rs/rayon/tree/master/src/iter/plumbing)
-
Parallel event iterator?
I did some very basic testing with this crate : https://crates.io/crates/rayon and it seems to work :
-
General Recommendations: Should I Use Tree-sitter as the AST for the LSP I am developing?
Sequentially, generating tree-sitter AST for each file and querying for the links of each file takes around 2.3 seconds. However, I randomly remembered this crate rayon, and I decided to test it. It ended up improving the performance (just by changing 2 lines of code) to 200-300ms by parallelizing the iterators and tree-sitter queries. MAJOR.
-
python to rust migration
Now if you really want to use Rust, you can rewrite only the part that are slowing down your consumer. It's easy by using Py03 and maturin. Maybe also rayon to parallelize.
What are some alternatives?
taichi - Productive, portable, and performant GPU programming in Python.
crossbeam - Tools for concurrent programming in Rust
futhark - :boom::computer::boom: A data-parallel functional programming language
tokio - A runtime for writing reliable asynchronous applications with Rust. Provides I/O, networking, scheduling, timers, ...
Image-Convolutaion-OpenCL
RxRust - The Reactive Extensions for the Rust Programming Language
TensorOperations.jl - Julia package for tensor contractions and related operations
rust-numpy - PyO3-based Rust bindings of the NumPy C-API
triton - Development repository for the Triton language and compiler
tokio-rayon - Mix async code with CPU-heavy thread pools using Tokio + Rayon
ponyc - Pony is an open-source, actor-model, capabilities-secure, high performance programming language
sqlx - 🧰 The Rust SQL Toolkit. An async, pure Rust SQL crate featuring compile-time checked queries without a DSL. Supports PostgreSQL, MySQL, and SQLite.