hyperlearn
MegEngine
Our great sponsors
hyperlearn | MegEngine | |
---|---|---|
4 | 5 | |
1,510 | 4,719 | |
0.0% | 0.8% | |
0.0 | 8.9 | |
over 1 year ago | 2 days ago | |
Jupyter Notebook | C++ | |
Apache License 2.0 | Apache License 2.0 |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
hyperlearn
-
80% faster, 50% less memory, 0% accuracy loss Llama finetuning
I agree fully - what do you suggest then? OSS the entire code base and using AGPL3? I tried that with https://github.com/danielhanchen/hyperlearn to no avail - we couldn't even monetize it at all, so I just OSSed everything.
I listed all the research articles and methods in Hyperlearn which in the end were gobbled up by other packages.
We still have to cover life expenses and stuff sadly as a startup.
Do you have any suggestions how we could go about this? We thought maybe an actual training / inference platform, and not even OSSing any code, but we decided against this, so we OSSed some code.
Ay suggestions are welcome!
-
80% faster, 50% less memory, 0% loss of accuracy Llama finetuning
Good point - the main issue is we encountered this exact issue with our old package Hyperlearn (https://github.com/danielhanchen/hyperlearn).
I OSSed all the code to the community - I'm actually an extremely open person and I love contributing to the OSS community.
The issue was the package got gobbled up by other startups and big tech companies with no credit - I didn't want any cash from it, but it stung and hurt really bad hearing other startups and companies claim it was them who made it faster, whilst it was actually my work. It hurt really bad - as an OSS person, I don't want money, but just some recognition for the work.
I also used to accept and help everyone with their writing their startup's software, but I never got paid or even any thanks - sadly I didn't expect the world to be such a hostile place.
So after a sad awakening, I decided with my brother instead of OSSing everything, we would first OSS something which is still very good - 5X faster training is already very reasonable.
I'm all open to other suggestions on how we should approach this though! There are no evil intentions - in fact I insisted we OSS EVERYTHING even the 30x faster algos, but after a level headed discussion with my brother - we still have to pay life expenses no?
If you have other ways we can go about this - I'm all ears!! We're literally making stuff up as we go along!
-
[Project] BFLOAT16 on ALL hardware (>= 2009), up to 2000x faster ML algos, 50% less RAM usage for all old/new hardware - Hyperlearn Reborn.
Hello everyone!! It's been a while!! Years back I released Hyperlearn https://github.com/danielhanchen/hyperlearn. It has 1.2K Github stars, where I made tonnes of algos faster:
MegEngine
-
How to speedup 31*31 conv 10 times
The Real Performance in MegEngine
-
[P] Train Model 3x as large with Dynamic Tensor Rematerialization
In Deep Learning you can trade space for compute by recomputing activation in backpropagation phase, known as gradient checkpointing. Classical gradient checkpointing algorithm is great but they dont work for eager execution. Dynamic Tensor Rematerialization(DTR) is a gradient checkpointing algorithm that work with eager execution, and is implemented at Megenine, a deep learning framework. Read this blogpost to learn more!
- Training 3x larger model on the same GPU cards
What are some alternatives?
gpt-fast - Simple and efficient pytorch-native transformer text generation in <1000 LOC of python.
DALI - A GPU-accelerated library containing highly optimized building blocks and an execution engine for data processing to accelerate deep learning training and inference applications.
data-science-notes - Notes of IBM Data Science Professional Certificate Courses on Coursera
executorch - On-device AI across mobile, embedded and edge for PyTorch
notebooks - Implement, demonstrate, reproduce and extend the results of the Risk articles 'Differential Machine Learning' (2020) and 'PCA with a Difference' (2021) by Huge and Savine, and cover implementation details left out from the papers.
norse - Deep learning with spiking neural networks (SNNs) in PyTorch.
ocaml-torch - OCaml bindings for PyTorch
Pytorch - Tensors and Dynamic neural networks in Python with strong GPU acceleration
DiffSharp - DiffSharp: Differentiable Functional Programming
taco - The Tensor Algebra Compiler (taco) computes sparse tensor expressions on CPUs and GPUs
python-machine-learning-book - The "Python Machine Learning (1st edition)" book code repository and info resource
mtensor - a c++/cuda template library for tensor lazy evaluation