segment-anything-fast
gpt-fast
segment-anything-fast | gpt-fast | |
---|---|---|
1 | 8 | |
1,117 | 5,152 | |
1.6% | 3.5% | |
6.9 | 8.3 | |
about 1 month ago | 2 days ago | |
Python | Python | |
Apache License 2.0 | BSD 3-clause "New" or "Revised" License |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
segment-anything-fast
-
80% faster, 50% less memory, 0% loss of accuracy Llama finetuning
How does this compare to PyTorch labs optimizations for Sam and llama2 ?
https://github.com/pytorch-labs/segment-anything-fast
https://github.com/pytorch-labs/gpt-fast
gpt-fast
-
[D] GPT-Fast performance on larger batch sizes
I'm toying around with gpt-fast (https://github.com/pytorch-labs/gpt-fast) and was wondering if anyone has run experiments @ BS>1?
- Optimum-NVIDIA - 28x faster inference in just 1 line of code !?
- GPT-Fast: Simple and efficient GPT inference in <1000 LOC of Python
-
GPT-Fast: A fast and hackable implementation of transformer inference in <1000 lines of native PyTorch with support for quantization, speculative decoding, TP, Nvidia/AMD support, and more!
And check out the code here: https://github.com/pytorch-labs/gpt-fast
-
80% faster, 50% less memory, 0% loss of accuracy Llama finetuning
How does this compare to PyTorch labs optimizations for Sam and llama2 ?
https://github.com/pytorch-labs/segment-anything-fast
https://github.com/pytorch-labs/gpt-fast
- Fast and hackable PyTorch native transformer inference
-
Accelerating Generative AI with PyTorch II: GPT, Fast
I'm wondering if gpt-fast has a version that can be run from Windows Command Prompt or Powershell?
https://github.com/pytorch-labs/gpt-fast/issues/45
What are some alternatives?
unsloth - Finetune Llama 3, Mistral & Gemma LLMs 2-5x faster with 80% less memory
TensorRT-LLM - TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.
hyperlearn - 2-2000x faster ML algos, 50% less memory usage, works on all hardware - new and old.
stable-fast - Best inference performance optimization framework for HuggingFace Diffusers on NVIDIA GPUs.
optimum-nvidia