executorch
MegEngine
executorch | MegEngine | |
---|---|---|
2 | 5 | |
1,172 | 4,722 | |
44.6% | 0.1% | |
10.0 | 8.9 | |
3 days ago | 10 days ago | |
C++ | C++ | |
GNU General Public License v3.0 or later | Apache License 2.0 |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
executorch
-
ExecuTorch: Enabling On-Device interference for embedded devices
Yes ExecuTorch is currently targeted at Edge devices. The runtime is written in C++ with 50KB binary size (without kernels) and should run in most of platforms. You are right that we have not integrated to Nvidia backend yet. Have you tried torch.compile() in PyTorch 2.0? It would do the Nvidia optimization for you without Torchscript. If you have specific binary size or edge specific request, feel free to file issues in https://github.com/pytorch/executorch/issues
MegEngine
-
How to speedup 31*31 conv 10 times
The Real Performance in MegEngine
-
[P] Train Model 3x as large with Dynamic Tensor Rematerialization
In Deep Learning you can trade space for compute by recomputing activation in backpropagation phase, known as gradient checkpointing. Classical gradient checkpointing algorithm is great but they dont work for eager execution. Dynamic Tensor Rematerialization(DTR) is a gradient checkpointing algorithm that work with eager execution, and is implemented at Megenine, a deep learning framework. Read this blogpost to learn more!
- Training 3x larger model on the same GPU cards
What are some alternatives?
dfdx - Deep learning in Rust, with shape checked tensors and neural networks
DALI - A GPU-accelerated library containing highly optimized building blocks and an execution engine for data processing to accelerate deep learning training and inference applications.
hyperlearn - 2-2000x faster ML algos, 50% less memory usage, works on all hardware - new and old.
candle - Minimalist ML framework for Rust
norse - Deep learning with spiking neural networks (SNNs) in PyTorch.
llama - Inference code for Llama models
Pytorch - Tensors and Dynamic neural networks in Python with strong GPU acceleration
Daisykit - Daisykit is an easy AI toolkit with face mask detection, pose detection, background matting, barcode detection, and more. With Daisykit, you don't need AI knowledge to build AI software.
taco - The Tensor Algebra Compiler (taco) computes sparse tensor expressions on CPUs and GPUs
mtensor - a c++/cuda template library for tensor lazy evaluation