phobos-next
toast
phobos-next | toast | |
---|---|---|
1 | 3 | |
0 | 43 | |
- | - | |
2.6 | 1.5 | |
about 2 months ago | 10 days ago | |
D | C++ | |
Boost Software License 1.0 | GNU General Public License v3.0 or later |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
phobos-next
-
A new programming language for high-performance computers
The only language I know for sure to do it for you (as in you don't have to write the type) was Jai a while back (I'm told Blow removed that feature).
The only language I've actually done it in, is D. It's probably doable in many other nu-C languages these days, but D at very least can make it basically seamless as long as you do some try-and-break-shit testing to make sure nothing is relying on saving pointers when they shouldn't. This obviously constrains the definition of automatic ;)
I don't have my implementation to hand because it grew out of patch that failed due to aforementioned pointer-saving in code that I'm not paid enough to refactor (), but here's one someone else made https://github.com/nordlow/phobos-next/blob/master/src/nxt/s... there's another one in that repository too. I've never used those particular implementations but they're both by people I know so hopefully they're not too bad.
A more subtle thing, which I haven't used in anger, but would like to try at some point is to use programmer annotations (probably in the form of user defined attributes) to try and group things so things are stored such that temporal locality <=> spacial locality, but I've never bothered to actually do it.
There are some arrays of structs in an old bit of the D compiler that are roughly the size of a cacheline, and aren't accessed particularly uniformly. I profiled this and found that something like 75% of all LLC misses (hitting DRAM) were due to 2 particularly miserable lines... inside an O(n^2) algorithm.
toast
-
How Many Lines of C It Takes to Execute a and B in Python?
I have a real life example in this commit: https://github.com/hpc4cmb/toast/pull/380/commits/a38d1d6dbc...
Replacing 2 lines of python code (with tens of glue code in Numba) with hundreds lines of C++ with glue code.
-
C++ is making me depressed / CUDA question
If you just want to do a matrix multiplication with CUDA (and not inside some CUDA code), you should use cuBLAS rather than CUTLASS (here is some wrapper code I wrote and the corresponding helper functions if your difficulty is using the library rather than linking it / building), it is a fairly straightforward BLAS replacement (it can be a pain to install but that is life with C++/nvidia).
- A new programming language for high-performance computers
What are some alternatives?
verified-scheduling
anydsl - Meta project to quickly build dependencies
Rust-CUDA - Ecosystem of libraries and tools for writing and executing fast GPU code fully in Rust.
exo - Exocompilation for productive programming of hardware accelerators
nalgebra - Linear algebra library for Rust.
Halide - a language for fast, portable data-parallel computation
atl - A Tensor Language
CUDA.jl - CUDA programming in Julia.
jax - Composable transformations of Python+NumPy programs: differentiate, vectorize, JIT to GPU/TPU, and more