FlexGen
stable-horde-notebook
FlexGen | stable-horde-notebook | |
---|---|---|
19 | 1 | |
5,350 | 5 | |
- | - | |
10.0 | 10.0 | |
about 1 year ago | over 1 year ago | |
Python | Jupyter Notebook | |
Apache License 2.0 | - |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
FlexGen
-
Training LLaMA-65B with Stanford Code
#1: Progress Update | 4 comments #2: the default UI on the pinned Google Colab is buggy so I made my own frontend - YAFFOA. | 18 comments #3: Paper reduces resource requirement of a 175B model down to 16GB GPU | 19 comments
-
Replika users fell in love with their AI chatbot companions. Then they lost them
It's really just a gpu vram limitation: affordable GPUs are rather memory starved.
Fortunately people have started writing implementations for pipelining across multiple gpus.
https://github.com/Ying1123/FlexGen
- Same as with Stable Diffusion, new AI based LAION, are coming up slowly but surely: Paper reduces resource requirement of a 175B model down to 16GB GPU
- And Here..We..Go: Running large language models like ChatGPTon a single GPU. Up to 100x faster than other offloading systems
-
When, how and why will this Stable Diffusion spring stop?
Actually there's a solution : read this paper https://github.com/Ying1123/FlexGen/blob/main/docs/paper.pdf
-
Exciting new shit.
Flexgen - Run big models on your small GPU https://github.com/Ying1123/FlexGen
- Paper reduces resource requirement of a 175B model down to 16GB GPU
- FlexGen - Run 175B Parameter Models on consumer hardware
- Running large language models like ChatGPT on a single GPU
- FlexGen: Running large language models like ChatGPT/GPT-3/OPT-175B on a single GPU
stable-horde-notebook
-
Running large language models like ChatGPT on a single GPU
https://github.com/aqualxx/stable-horde-notebook
My only problem with stable horde is that their anti-cp measure involves checking the prompt for words like small, meaning I can't use a nsfw-capable model with certain prompts (holding a very small bag, etc). That, and seeing great things in the image rating and being unable to reproduce because it doesn't provide the prompt.
What are some alternatives?
text-generation-webui - A Gradio web UI for Large Language Models. Supports transformers, GPTQ, AWQ, EXL2, llama.cpp (GGUF), Llama models.
CTranslate2 - Fast inference engine for Transformer models
Open-Assistant - OpenAssistant is a chat-based assistant that understands tasks, can interact with third-party systems, and retrieve information dynamically to do so.
ggml - Tensor library for machine learning
accelerate - 🚀 A simple way to launch, train, and use PyTorch models on almost any device and distributed configuration, automatic mixed precision (including fp8), and easy-to-configure FSDP and DeepSpeed support
rust-bert - Rust native ready-to-use NLP pipelines and transformer-based models (BERT, DistilBERT, GPT2,...)
stanford_alpaca - Code and documentation to train Stanford's Alpaca models, and generate the data.
bitsandbytes - Accessible large language models via k-bit quantization for PyTorch.
rwkvstic - Framework agnostic python runtime for RWKV models
FlexGen - Running large language models on a single GPU for throughput-oriented scenarios.
PaLM-rlhf-pytorch - Implementation of RLHF (Reinforcement Learning with Human Feedback) on top of the PaLM architecture. Basically ChatGPT but with PaLM