HPX
NCCL
Our great sponsors
HPX | NCCL | |
---|---|---|
15 | 3 | |
2,417 | 2,796 | |
2.6% | 3.5% | |
9.8 | 5.9 | |
1 day ago | 2 days ago | |
C++ | C++ | |
Boost Software License 1.0 | GNU General Public License v3.0 or later |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
HPX
- Does anyone know any good open source project to optimize?
- Looking for projects to contribute to
-
What are some C++ projects with high quality code that I can read through?
https://github.com/STEllAR-GROUP/hpx Modern C++ concepts incorporated in a threading library. Lots of useful techniques used in there and we are trying to keep our code base very tidy. Feel free to chime in our libera channel #ste||ar if you have any questions.
-
Any C++ open source projects for beginners?
https://github.com/STEllAR-GROUP/hpx Welcoming community + we have been part of GSoC for 4-5 years now so feel free to apply there when it opens ;)
-
Getting started with first HPC project
You definitely do not want to learn Boost, trust me. The cudatoolkit is fine, HPX is great, so are Dask, and Ray. I do not recommend MPI unless those computers you have use InfiniBand.
-
Questions about writing my own CFD code
I found this interesting library that might fit your goal.
-
John "God" Carmack: C++ with a C flavor is still the best (also: Python performance "keeps hitting me in the face")
I personally like the ideas in Parallelism v2 TS, which is available in for libstdc++ 11 onwards. The reference implementation is a library named Vc (afaik Vc is the most popular SIMD library for C++), and this has also been implemented in recent versions of HPX.
-
Is there any good reason not to build an open-source C++ project on Intels oneTBB?
I am aware of DAGs of task based threading library like Taskflow and HPX however the benefit they have is not obvious to me, as the following sequential section depends on the parallel part being completed fully. If you want to suggest elaboration on the benefits of this approach would be welcome.
-
How to publish a paper about my own C++ software
Github: https://github.com/STEllAR-GROUP/hpx
-
Would anyone be interested in an HPC coroutine library for MPI?
We're working on something similar, but based on sender/receiver in HPX (a lightweight threading runtime) and DLA-Future (distributed linear algebra currently based on (HPX) futures; based on sender/receiver in the future). With senders-as-awaitables this would also get you coroutine support for asynchronous MPI calls for free. We don't have that yet, but it's planned. In the meantime libunifex should be able to fill in the gaps.
NCCL
-
MPI jobs to test
% rm -rf /tmp/nccl ; git clone --recursive https://github.com/NVIDIA/nccl.git ; cd nccl ; git grep MPI Cloning into 'nccl'... remote: Enumerating objects: 2769, done. remote: Counting objects: 100% (336/336), done. remote: Compressing objects: 100% (140/140), done. remote: Total 2769 (delta 201), reused 287 (delta 196), pack-reused 2433 Receiving objects: 100% (2769/2769), 3.04 MiB | 3.37 MiB/s, done. Resolving deltas: 100% (1820/1820), done. README.md:NCCL (pronounced "Nickel") is a stand-alone library of standard communication routines for GPUs, implementing all-reduce, all-gather, reduce, broadcast, reduce-scatter, as well as any send/receive based communication pattern. It has been optimized to achieve high bandwidth on platforms using PCIe, NVLink, NVswitch, as well as networking using InfiniBand Verbs or TCP/IP sockets. NCCL supports an arbitrary number of GPUs installed in a single node or across multiple nodes, and can be used in either single- or multi-process (e.g., MPI) applications. src/collectives/broadcast.cc:/* Deprecated original "in place" function, similar to MPI */
-
NVLink and Dual 3090s
If it's rendering, you don't really need SLI, you need to install NCCL so that GPUs memory can be pooled: https://github.com/NVIDIA/nccl
-
Distributed Training Made Easy with PyTorch-Ignite
backends from native torch distributed configuration: nccl, gloo, mpi.
What are some alternatives?
Taskflow - A General-purpose Parallel and Heterogeneous Task Programming System
gloo - Collective communications library with various primitives for multi-machine training.
Thrust - [ARCHIVED] The C++ parallel algorithms library. See https://github.com/NVIDIA/cccl
C++ Actor Framework - An Open Source Implementation of the Actor Model in C++
RaftLib - The RaftLib C++ library, streaming/dataflow concurrency via C++ iostream-like operators
libcds - A C++ library of Concurrent Data Structures
xla - Enabling PyTorch on XLA Devices (e.g. Google TPU)
Boost.Compute - A C++ GPU Computing Library for OpenCL
Easy Creation of GnuPlot Scripts from C++ - A simple C++17 lib that helps you to quickly plot your data with GnuPlot
ArrayFire - ArrayFire: a general purpose GPU library.