OpenCL-Wrapper
OpenCL-examples
OpenCL-Wrapper | OpenCL-examples | |
---|---|---|
7 | 2 | |
263 | 185 | |
- | - | |
5.7 | 0.0 | |
8 days ago | 11 months ago | |
C++ | Objective-C++ | |
GNU General Public License v3.0 or later | - |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
OpenCL-Wrapper
-
What 8x AMD Instinct MI200 GPUs can do with a combined 512GB VRAM: Bell 222 Helicopter in FluidX3D CFD - 10 Billion Cells, 75k Time Steps, 71TB vizualized - 6.4 hours compute+rendering with OpenCL
In case you go with OpenCL, start here: https://github.com/ProjectPhysX/OpenCL-Wrapper
-
In the next 5 years, what do you think can push OpenCL adoption?
I've also open-sourced an OpenCL-Wrapper to eliminate all of the boilerplate code that otherwise comes with the OpenCL C++ bindings and lower the entry barrier. Especially for larger projects, the biolerplate code becomes really offputting, and I solved it entirely.
-
What's your main programming language?
Somewhat unusual these days, but I mainly use OpenCL C. It's seems cumbersome and hard to learn at first, but becomes much more easy to use with the right tools. Once you master it, it whipes the floor with CPU programming; it's not unusual to see 100x speedup on a GPU compared to multithreaded CPU code at the same energy consumption. It's just as fast as CUDA - as efficient as the microarchitecture allows - but compatible with literally all GPU/CPU hardware of the last decade. No need to waste time on code porting if the next supercomputer has GPUs from a different vendor, it just runs out-of-the-box. Ideal for scientific compute!
-
How do you allocate more than 4GB of memory for OpenCL in A770 16GB?
I added this to my OpenCL-Wrapper in this commit, so anything built on top of it, such as FluidX3D, works on Arc out-of-the-box. Additionally, I fixed Intel's wrong VRAM capacity reporting on Arc in this patch.
-
New project - Which framework/libraries to use ?
Try OpenCL. You only need to implement the code once (in a vectorized form) and it works cross-platform on all GPUs and all CPUs, even on FPGAs. Performance is exactly as good as CUDA. There is still no rivaling framework today, although SYCL is starting to become a viable alternative.
- Want to to learn OpenCL on C++ without the painful clutter that comes with the C++ bindings? My lightweight OpenCL-Wrapper makes it super simple. Automatically select the fastest GPU in 1 line. Create Host+Device Buffers and Kernels in 1 line. It even automatically tracks Device memory allocation.
-
Most user friendly way to write OpenCL kernels.
I have found that OpenCL-Wrapper from PhysX has a great solution to this : https://github.com/ProjectPhysX/OpenCL-Wrapper/
OpenCL-examples
-
Lisa Su Saved AMD. Now She Wants Nvidia's AI Crown
In the link provided, the CUDA example only show the compute kernel itself and not the boilerplate required to run it. On the other hand, your OpenCL example only show the boilerplate.
This is the OpenCL kernel from the same repo, for a more fair comparison: https://github.com/rsnemmen/OpenCL-examples/blob/master/mand...
This is much more readable. OpenCL-C the language is fine: it's how you deploy the program on the cards that is complicated with opencl.
What are some alternatives?
FluidX3D - The fastest and most memory efficient lattice Boltzmann CFD software, running on all GPUs via OpenCL.
intel-extension-for-tensorflow - Intel® Extension for TensorFlow*
coremltools - Core ML tools contain supporting tools for Core ML model conversion, editing, and validation.
dolfinx - Next generation FEniCS problem solving environment
ROCm - AMD ROCmâ„¢ Software - GitHub Home [Moved to: https://github.com/ROCm/ROCm]
VectorVisor - VectorVisor is a vectorizing binary translator for GPUs, designed to make it easy to run many copies of a single-threaded WebAssembly program in parallel using GPUs
kernel_tuner - Kernel Tuner
cccl - CUDA C++ Core Libraries
neanderthal - Fast Clojure Matrix Library
chipStar - chipStar is a tool for compiling and running HIP/CUDA on SPIR-V via OpenCL or Level Zero APIs.