OpenCL-Wrapper
intel-extension-for-tensorflow
OpenCL-Wrapper | intel-extension-for-tensorflow | |
---|---|---|
7 | 9 | |
263 | 303 | |
- | 0.0% | |
5.7 | 9.6 | |
8 days ago | 8 days ago | |
C++ | C++ | |
GNU General Public License v3.0 or later | GNU General Public License v3.0 or later |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
OpenCL-Wrapper
-
What 8x AMD Instinct MI200 GPUs can do with a combined 512GB VRAM: Bell 222 Helicopter in FluidX3D CFD - 10 Billion Cells, 75k Time Steps, 71TB vizualized - 6.4 hours compute+rendering with OpenCL
In case you go with OpenCL, start here: https://github.com/ProjectPhysX/OpenCL-Wrapper
-
In the next 5 years, what do you think can push OpenCL adoption?
I've also open-sourced an OpenCL-Wrapper to eliminate all of the boilerplate code that otherwise comes with the OpenCL C++ bindings and lower the entry barrier. Especially for larger projects, the biolerplate code becomes really offputting, and I solved it entirely.
-
What's your main programming language?
Somewhat unusual these days, but I mainly use OpenCL C. It's seems cumbersome and hard to learn at first, but becomes much more easy to use with the right tools. Once you master it, it whipes the floor with CPU programming; it's not unusual to see 100x speedup on a GPU compared to multithreaded CPU code at the same energy consumption. It's just as fast as CUDA - as efficient as the microarchitecture allows - but compatible with literally all GPU/CPU hardware of the last decade. No need to waste time on code porting if the next supercomputer has GPUs from a different vendor, it just runs out-of-the-box. Ideal for scientific compute!
-
How do you allocate more than 4GB of memory for OpenCL in A770 16GB?
I added this to my OpenCL-Wrapper in this commit, so anything built on top of it, such as FluidX3D, works on Arc out-of-the-box. Additionally, I fixed Intel's wrong VRAM capacity reporting on Arc in this patch.
-
New project - Which framework/libraries to use ?
Try OpenCL. You only need to implement the code once (in a vectorized form) and it works cross-platform on all GPUs and all CPUs, even on FPGAs. Performance is exactly as good as CUDA. There is still no rivaling framework today, although SYCL is starting to become a viable alternative.
- Want to to learn OpenCL on C++ without the painful clutter that comes with the C++ bindings? My lightweight OpenCL-Wrapper makes it super simple. Automatically select the fastest GPU in 1 line. Create Host+Device Buffers and Kernels in 1 line. It even automatically tracks Device memory allocation.
-
Most user friendly way to write OpenCL kernels.
I have found that OpenCL-Wrapper from PhysX has a great solution to this : https://github.com/ProjectPhysX/OpenCL-Wrapper/
intel-extension-for-tensorflow
-
Watch out AMD: Intel Arc A580 could be the next great affordable GPU
Intel already has a working GPGPU stack, using oneAPI/SYCL.
They also have arguably pretty good OpenCL support, as well as downstream support for PyTorch and Tensorflow using their custom extensions https://github.com/intel/intel-extension-for-tensorflow and https://github.com/intel/intel-extension-for-pytorch which are actively developed and just recently brought up-to-date with upstream releases.
-
How do you allocate more than 4GB of memory for OpenCL in A770 16GB?
I tried Intel® Extension for PyTorch* v1.13.10+xpu and intel-extension-for-tensorflow
-
I'm really happy with the card although the Ti version offers much better performance
Yeah I recently stubbled on it when I was looking into buying a 16gb a770 and wondering what was possible now. GitHub Intel extension for tensorflow
-
Does anyone uses Intel Arc A770 GPU for machine learning? [D]
Intel publish extensions for PyTorch and Tensorflow. I’ve been working with PyTorch so I just needed to follow these instructions to get everything set up.
- Intel Extension for TensorFlow
- Intel Extension for TensorFlow Released
-
SD on intel arc?
Actually I was just on GitHub trying to submit issues related to me testing Intel's PyTorch and Tensorflow extensions when I saw this; it seems that someone has already ported SD over to the tensorflow framework and so you can probably start using intel's extension for tensorflow with it immediately; and according to this article you can use Intel's extension within WSL under windows as well. But unfortunately given how the guy whose issue I linked to has been facing pretty serious performance issues of inferencing taking many minutes longer than it should when using an A770 to do SD-related inferencing, you might be better off waiting for intel's extension for tensorflow versions 1.2 and greater or something like that, so that when it's your turn to use it, Intel has already ironed out most of the major bugs within the software :)
What are some alternatives?
FluidX3D - The fastest and most memory efficient lattice Boltzmann CFD software, running on all GPUs via OpenCL.
stable-diffusion-tensorflow - Stable Diffusion in TensorFlow / Keras
OpenCL-examples - Simple OpenCL examples for exploiting GPU computing
intel-extension-for-pytorch - A Python package for extending the official PyTorch that can easily obtain performance on Intel platform
dolfinx - Next generation FEniCS problem solving environment
VectorVisor - VectorVisor is a vectorizing binary translator for GPUs, designed to make it easy to run many copies of a single-threaded WebAssembly program in parallel using GPUs
compute-runtime - Intel® Graphics Compute Runtime for oneAPI Level Zero and OpenCL™ Driver