ArrayFire
deepdetect
Our great sponsors
ArrayFire | deepdetect | |
---|---|---|
6 | 4 | |
4,404 | 2,495 | |
1.2% | 0.3% | |
7.8 | 6.7 | |
24 days ago | 4 days ago | |
C++ | C++ | |
BSD 3-clause "New" or "Revised" License | GNU General Public License v3.0 or later |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
ArrayFire
-
Learn WebGPU
Loads of people have stated why easy GPU interfaces are difficult to create, but we solve many difficult things all the time.
Ultimately I think CPUs are just satisfactory for the vast vast majority of workloads. Servers rarely come with any GPUs to speak of. The ecosystem around GPUs is unattractive. CPUs have SIMD instructions that can help. There are so many reasons not to use GPUs. By the time anyone seriously considers using GPUs they're, in my imagination, typically seriously starved for performance, and looking to control as much of the execution details as possible. GPU programmers don't want an automagic solution.
So I think the demand for easy GPU interfaces is just very weak, and therefore no effort has taken off. The amount of work needed to make it as easy to use as CPUs is massive, and the only reason anyone would even attempt to take this on is to lock you in to expensive hardware (see CUDA).
For a practical suggestion, have you taken a look at https://arrayfire.com/ ? It can run on both CUDA and OpenCL, and it has C++, Rust and Python bindings.
-
seeking C++ library for neural net inference, with cross platform GPU support
What about Arrayfire. https://github.com/arrayfire/arrayfire
-
[D] Deep Learning Framework for C++.
Low-overhead — not our goal, but Flashlight is on par with or outperforming most other ML/DL frameworks with its ArrayFire reference tensor implementation, especially on nonstandard setups where framework overhead matters
-
[D] Neural Networks using a generic GPU framework
Looking for frameworks with Julia + OpenCL I found array fire. It seems quite good, bonus points for rust bindings. I will keep looking for more, Julia completely fell off my radar.
- Windows 11 va bloquer les bidouilles qui facilitent l'emploi d'un navigateur alternatif à Edge
-
Arrayfire progressive performance decline?
Your Problem may be the lazy evaluation, see this issue: https://github.com/arrayfire/arrayfire/issues/1709
deepdetect
-
Exploring Open-Source Alternatives to Landing AI for Robust MLOps
For those seeking a lightweight solution for setting up deep learning REST APIs across platforms without the complexity of Kubernetes, Deepdetect is worth considering.
-
[D] Deep Learning Framework for C++.
But you need to have good reasons to do it. Ours is that we have a multi-backend framework, and that we don't want any step in between dev & run. C++ allows for this since the same code can run on training server and edge device as needed. It also allows for building full AI applicatioms with great performances (e g. real time) We dev & use https://github.com/jolibrain/deepdetect for these purposes and it serves us very well, but it's not the faint of heart !
-
[P] Real-time AR for jewelry virtual try on that looks real, done with joliGAN, based on a few 2D videos and no 3D model
- Real-time is achieved through our full C++ Open Source backend DeepDetect, https://github.com/jolibrain/deepdetect. We use CUDA along with OpenCV and TensorRT to chain multiple models (ring detection and generator mostly), and we make sure the data remain within CUDA memory at all time. This allows us to reach ~60 FPS on 1080Ti and 20% more on average on an RTX3090.
-
[P] Benchmarking OpenBLAS on an Apple MacBook M1
Interesting, thanks. Recently benchmarked inference with Vulkan/MoltenVK/NCNN, M1 GPU is roughly 30% faster than M1 CPU, https://github.com/jolibrain/deepdetect/pull/1105 for single batch inference (NCNN does not really support batch size > 1).
What are some alternatives?
Thrust - [ARCHIVED] The C++ parallel algorithms library. See https://github.com/NVIDIA/cccl
ncnn - ncnn is a high-performance neural network inference framework optimized for the mobile platform
Boost.Compute - A C++ GPU Computing Library for OpenCL
netron - Visualizer for neural network, deep learning and machine learning models
VexCL - VexCL is a C++ vector expression template library for OpenCL/CUDA/OpenMP
tensorflow-wheels - Tensorflow Wheels
Pytorch - Tensors and Dynamic neural networks in Python with strong GPU acceleration
YoloV7-ncnn-Jetson-Nano - YoloV7 for a Jetson Nano using ncnn.
CUB - THIS REPOSITORY HAS MOVED TO github.com/nvidia/cub, WHICH IS AUTOMATICALLY MIRRORED HERE.
mdspan - Reference implementation of mdspan targeting C++23
Taskflow - A General-purpose Parallel and Heterogeneous Task Programming System
mmaction2 - OpenMMLab's Next Generation Video Understanding Toolbox and Benchmark