ncnn
Halide
Our great sponsors
ncnn | Halide | |
---|---|---|
12 | 43 | |
19,234 | 5,703 | |
2.1% | 1.0% | |
9.4 | 9.5 | |
4 days ago | 5 days ago | |
C++ | C++ | |
GNU General Public License v3.0 or later | GNU General Public License v3.0 or later |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
ncnn
-
AMD Funded a Drop-In CUDA Implementation Built on ROCm: It's Open-Source
ncnn uses Vulkan for GPU acceleration, I've seen it used in a few projects to get AMD hardware support.
https://github.com/Tencent/ncnn
-
[D] Best way to package Pytorch models as a standalone application
They're using NCNN to package the model. Have a look. https://github.com/Tencent/NCNN
-
Realtime object detection android app
Hi. Here is my prefered android app for realtime objet detection: https://github.com/nihui/ncnn-android-nanodet ; https://github.com/Tencent/ncnn contains a lot of android demo app for a lot of models.
- ncnn: High-performance neural network inference framework optimized for mobile
-
Esp32 tensorflow lite
ncnn home page: https://github.com/Tencent/ncnn
-
MMDeploy: Deploy All the Algorithms of OpenMMLab
ncnn
-
Draw Things, Stable Diffusion in your pocket, 100% offline and free
Yes, Android devices tend to have bigger RAMs, making running 1024x1024 possible (this is not possible at all on iPhones, which could peak around 5GiB memory with my current implementation, some serious engineering required to bring that down on iPhone devices). The problem is I am not sure about speed. I would likely switch to NCNN (https://github.com/Tencent/ncnn) as the backend which have a decent Vulkan computing kernel support. It is definitely a possibility and there is a path to do that.
- What’s New in TensorFlow 2.10?
-
[Technical Article] OCR Upgrade
As the leading open-source inference framework in China and in the world, what we like are its almost zero cost cross-platform capability, high inference speed, and minimal deployment volume. (Project address: https://github.com/Tencent/ncnn)
-
Is there a functioning neural netowork or backbone written in pure C language only?
If you’re not planning on training the neural net on an embedded device and just do inference, this might interest you: https://github.com/Tencent/ncnn
Halide
-
Show HN: Flash Attention in ~100 lines of CUDA
If CPU/GPU execution speed is the goal while simultaneously code golfing the source size, https://halide-lang.org/ might have come in handy.
- Halide v17.0.0
-
From slow to SIMD: A Go optimization story
This is a task where Halide https://halide-lang.org/ could really shine! It disconnects logic from scheduling (unrolling, vectorizing, tiling, caching intermediates etc), so every step the author describes in the article is a tunable in halide. halide doesn't appear to have bindings for golang so calling C++ from go might be the only viable option.
-
Implementing Mario's Stack Blur 15 times in C++ (with tests and benchmarks)
Probably would have been much easier to do 15 times in https://halide-lang.org/
The idea behind Halide is that scheduling memory access patterns is critical to performance. But, access patterns being interwoven into arithmetic algorithms makes them difficult to modify separately.
So, in Halide you specify the arithmetic and the schedule separately so you can rapidly iterate on either.
- Making Hard Things Easy
-
Deepmind Alphadev: Faster sorting algorithms discovered using deep RL
It is not the sorting per-se which was improved here, but sorting (particularly short sequences) on modern CPUs with really the complexity being on the difficulty of predicting what will work quickly on these modern CPUs.
Doing an empirical algorithm search to find which algorithms fit well on modern CPUs/memory systems is pretty common, see e.g. FFTW, ATLAS, https://halide-lang.org/
-
Two-tier programming language
Halide https://halide-lang.org/
- Best book on writing an optimizing compiler (inlining, types, abstract interpretation)?
-
Blog Post: Can You Trust a Compiler to Optimize Your Code?
It doesn’t apply in this case, but in general if you really want the best vectorization I would suggest using https://halide-lang.org instead of trying to coerce your compiler.
-
What would make you try a new language?
If we drop the "APL" requirement, wouldn't Halide fit your criteria for the third?
What are some alternatives?
XNNPACK - High-efficiency floating-point neural network inference operators for mobile, server, and Web
taichi - Productive, portable, and performant GPU programming in Python.
rife-ncnn-vulkan - RIFE, Real-Time Intermediate Flow Estimation for Video Frame Interpolation implemented with ncnn library
futhark - :boom::computer::boom: A data-parallel functional programming language
deepdetect - Deep Learning API and Server in C++14 support for Caffe, PyTorch,TensorRT, Dlib, NCNN, Tensorflow, XGBoost and TSNE
Image-Convolutaion-OpenCL
netron - Visualizer for neural network, deep learning and machine learning models
TensorOperations.jl - Julia package for tensor contractions and related operations
darknet - Convolutional Neural Networks
triton - Development repository for the Triton language and compiler
RPi_64-bit_Zero-2-image - Raspberry Pi Zero 2 W 64-bit OS image with OpenCV, TensorFlow Lite and ncnn Framework.
ponyc - Pony is an open-source, actor-model, capabilities-secure, high performance programming language