ArrayFire
moodycamel
Our great sponsors
ArrayFire | moodycamel | |
---|---|---|
6 | 11 | |
4,392 | 8,785 | |
0.9% | - | |
7.8 | 3.9 | |
13 days ago | 10 months ago | |
C++ | C++ | |
BSD 3-clause "New" or "Revised" License | GNU General Public License v3.0 or later |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
ArrayFire
-
Learn WebGPU
Loads of people have stated why easy GPU interfaces are difficult to create, but we solve many difficult things all the time.
Ultimately I think CPUs are just satisfactory for the vast vast majority of workloads. Servers rarely come with any GPUs to speak of. The ecosystem around GPUs is unattractive. CPUs have SIMD instructions that can help. There are so many reasons not to use GPUs. By the time anyone seriously considers using GPUs they're, in my imagination, typically seriously starved for performance, and looking to control as much of the execution details as possible. GPU programmers don't want an automagic solution.
So I think the demand for easy GPU interfaces is just very weak, and therefore no effort has taken off. The amount of work needed to make it as easy to use as CPUs is massive, and the only reason anyone would even attempt to take this on is to lock you in to expensive hardware (see CUDA).
For a practical suggestion, have you taken a look at https://arrayfire.com/ ? It can run on both CUDA and OpenCL, and it has C++, Rust and Python bindings.
-
[D] Deep Learning Framework for C++.
Low-overhead — not our goal, but Flashlight is on par with or outperforming most other ML/DL frameworks with its ArrayFire reference tensor implementation, especially on nonstandard setups where framework overhead matters
-
[D] Neural Networks using a generic GPU framework
Looking for frameworks with Julia + OpenCL I found array fire. It seems quite good, bonus points for rust bindings. I will keep looking for more, Julia completely fell off my radar.
moodycamel
-
moodycamel VS lockfree_mpmc_queue - a user suggested alternative
2 projects | 21 Apr 2022
-
Matthias Killat - Lock-free programming for real-time systems - Meeting C++ 2021
Not literatue but an example. This is a lock-free (not wait-free!) multi-producer multi-consumer queue, not a FIFO, but access patterns should be similar - if not the same: https://github.com/cameron314/concurrentqueue
-
Learning Clojure made me return back to C/C++
If I do implement it, the most likely route I'd take is make a compiler in Clojure/clojurescript that uses Instaparse (I have a more-or-less-clojure grammar written that I was tinkering with) and generate C++ code that uses Immer for its data structures and Zug for transducers and what my not-quite-clojure would support would be heavily dependent on what the C++ code and libraries I use can do. I'd use Taskflow to implement a core.async style system (not sure how to implement channels, maybe this but I'm unsure if its a good fit, but I also haven't looked). I would ultimately want to be able to interact with C++ code, so having some way to call C++ classes (even templated ones) would be a must. I'm unsure if I would just copy (and extend as needed) Clojure's host interop functionality or not. I had toyed with the idea that you can define the native types (including templates) as part of the type annotations and then the user-level code basically just looks like a normal function. But I didn't take it very far yet, haven't had the time. The reason I'd take this approach is that I'm writing a good bit of C++ again and I'd love to do that in this not-quite-clojure language, if I did make it. A bunch of languages, like Haxe and Nim compile to C or C++, so I think its a perfectly reasonable approach, and if interop works well enough, then just like Clojure was able to leverage the Java ecosystem, not-quite-clojure could be bootstrapped by leveraging the C++ ecosystem. But its mostly just a vague dream right now.
-
Recommendations for C++ library for shared memory (multiple producers/single consumer)
I would recommend https://github.com/cameron314/concurrentqueue as it's very battle tested and fast.
-
fmtlog: fastest C++ logging library using fmtlib syntax
This was explicitly considered for spdlog (using the moodycamel::ConcurrentQueue) but rejected for the above reason. I'm not involved in the development of spdlog but personally I agree, for me it's important that log output is not all mixed up.
-
Functional programming in C++ (2012)
> So the big win with functional programming is easier testibility and fewer hazards when trying to multi-thread your code.
To give you my experience: during my phd, I developed https://ossia.io in C++. For the manuscript redaction, I rewrote all the core algorithms in pure functional OCaml. When I did some tests, performance was slower than -O0 C++ (so it's not even a given that multithreaded OCaml would outperform single-thread C++), the tests weren't meaningfully simpler to write, and it would be pretty much impossible to have an average comp. sci. student contribute to the code.
My experience multi-threading C++ code is, "slap cpp-taskflow, TBB, RaftLib" or any kind of threaded task system and enjoy arbitrary scaling. Hardly the pain it is made to be unless you have a need to go down to std::thread level, but even then using something like https://github.com/cameron314/concurrentqueue to communicate between threads makes things extremely painless.
What are some alternatives?
Thrust - [ARCHIVED] The C++ parallel algorithms library. See https://github.com/NVIDIA/cccl
Boost.Compute - A C++ GPU Computing Library for OpenCL
MPMCQueue.h - A bounded multi-producer multi-consumer concurrent queue written in C++11
Taskflow - A General-purpose Parallel and Heterogeneous Task Programming System
VexCL - VexCL is a C++ vector expression template library for OpenCL/CUDA/OpenMP
readerwriterqueue - A fast single-producer, single-consumer lock-free queue for C++
RaftLib - The RaftLib C++ library, streaming/dataflow concurrency via C++ iostream-like operators
libcds - A C++ library of Concurrent Data Structures
Pytorch - Tensors and Dynamic neural networks in Python with strong GPU acceleration
CUB - THIS REPOSITORY HAS MOVED TO github.com/nvidia/cub, WHICH IS AUTOMATICALLY MIRRORED HERE.