minimal-llama vs peft

minimal-llama

By zphang

Suggest topics

Source Code

Suggest alternative

Edit details

peft

🤗 PEFT: State-of-the-art Parameter-Efficient Fine-Tuning. (by huggingface)

Adapter diffusion llm parameter-efficient-learning Python Pytorch Transformers Lora

Source Code

huggingface.co

Suggest alternative

Edit details

InfluxDB - Power Real-Time Data Analytics at Scale

Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

www.influxdata.com

featured

SaaSHub - Software Alternatives and Reviews

SaaSHub helps you find the best software and product alternatives

www.saashub.com

featured

minimal-llama		peft
	Project
4	Mentions	26
456	Stars	13,877
-	Growth	4.1%
8.5	Activity	9.7
7 months ago	Latest Commit	3 days ago
Python	Language	Python
-	License	Apache License 2.0

The number of mentions indicates the total number of mentions that we've tracked plus the number of user suggested alternatives.
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.

minimal-llama

Posts with mentions or reviews of minimal-llama. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2023-03-21.

Show HN: Finetune LLaMA-7B on commodity GPUs using your own text
16 projects | news.ycombinator.com | 21 Mar 2023
Visual ChatGPT
8 projects | news.ycombinator.com | 9 Mar 2023

I can't edit my comment now, but it's 30B that needs 18GB of VRAM.
LLaMA-13B, GPT-3 175B level, only needs 10GB of VRAM with the GPTQ 4bit quantization.
>do you think there's anything left to trim? like weight pruning, or LoRA, or I dunno, some kind of Huffman coding scheme that lets you mix 4-bit, 2-bit and 1-bit quantizations?
Absolutely. The GPTQ paper claims negligible output quality loss with 3-bit quantization. The GPTQ-for-LLaMA repo supports 3-bit quantization and inference. So this extra 25% savings is already possible.
As of right GPTQ-for-LLaMA is using a VRAM hungry attention method. Flash attention will reduce the requirements for 7B to 4GB and possibly fit 30B with a 2048 context window into 16GB, all before stacking 3-bit.
Pruning is a possibility but I'm not aware of anyone working on it yet.
LoRa has already been implemented. See https://github.com/zphang/minimal-llama#peft-fine-tuning-wit...

peft

Posts with mentions or reviews of peft. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2023-12-05.

LoftQ: LoRA-fine-tuning-aware Quantization
1 project | news.ycombinator.com | 19 Dec 2023
Fine Tuning Mistral 7B on Magic the Gathering Draft
4 projects | news.ycombinator.com | 5 Dec 2023

There is not a lot of great content out there making this clear, but basically all that matters for basic fine tuning is how much VRAM you have -- since the 3090 / 4090 have 24GB VRAM they're both pretty decent fine tuning chips. I think you could probably fine-tune a model up to ~13B parameters on one of them with PEFT (https://github.com/huggingface/peft)
Whisper prompt tuning
2 projects | /r/learnmachinelearning | 10 Oct 2023

Hi everyone. Recently I've been looking into the PEFT library (https://github.com/huggingface/peft) and I was wondering if it would be possible to do prompt tuning with OpenAI's Whisper model. They have an example notebook for tuning Whisper with LoRA (https://colab.research.google.com/drive/1vhF8yueFqha3Y3CpTHN6q9EVcII9EYzs?usp=sharing) but I'm not sure how to go about changing it to use prompt tuning instead.
Code Llama - The Hugging Face Edition
3 projects | /r/LocalLLaMA | 27 Aug 2023

In the coming days, we'll work on sharing scripts to train models, optimizations for on-device inference, even nicer demos (and for more powerful models), and more. Feel free to like our GitHub repos (transformers, peft, accelerate). Enjoy!
PEFT 0.5 supports fine-tuning GPTQ models
1 project | /r/LocalLLaMA | 24 Aug 2023
Exploding loss when trying to train OpenOrca-Platypus2-13B
1 project | /r/LocalLLaMA | 21 Aug 2023

image
[D] Is there a difference between p-tuning and prefix tuning ?
1 project | /r/MachineLearning | 3 Jul 2023

I discussed part of this here: https://github.com/huggingface/peft/issues/123
How does using QLoRAs when running Llama on CPU work?
2 projects | /r/LocalLLaMA | 23 Jun 2023

It seems like the merge_and_unload function in this PEFT script might be what they are referring to: https://github.com/huggingface/peft/blob/main/src/peft/tuners/lora.py
How to merge the two weights into a single weight?
3 projects | /r/LocalLLaMA | 9 Jun 2023

To obtain the original llama model, one may refer to this doc. To merge a lora model with a base model, one may refer to PEFT or use the merge script provided by LMFlow.
[D] [LoRA + weight merge every N step] for pre-training?
1 project | /r/MachineLearning | 29 May 2023

you could use a callback, like show here, https://github.com/huggingface/peft/issues/286 and call code to merge them here.

What are some alternatives?

When comparing minimal-llama and peft you can also consider the following projects:

FlexGen - Running large language models on a single GPU for throughput-oriented scenarios.

lora - Using Low-rank adaptation to quickly fine-tune diffusion models.

visual-chatgpt - Official repo for the paper: Visual ChatGPT: Talking, Drawing and Editing with Visual Foundation Models [Moved to: https://github.com/microsoft/TaskMatrix]

LoRA - Code for loralib, an implementation of "LoRA: Low-Rank Adaptation of Large Language Models"

whisper.cpp - Port of OpenAI's Whisper model in C/C++

alpaca-lora - Instruct-tune LLaMA on consumer hardware

simple-llm-finetuner - Simple UI for LLM Model Finetuning

dalai - The simplest way to run LLaMA on your local machine

mlc-llm - Enable everyone to develop, optimize and deploy AI models natively on everyone's devices.

GPTQ-for-LLaMa - 4 bits quantization of LLaMA using GPTQ

minLoRA - minLoRA: a minimal PyTorch library that allows you to apply LoRA to any PyTorch model.

minimal-llama vs FlexGen peft vs lora minimal-llama vs visual-chatgpt peft vs LoRA minimal-llama vs whisper.cpp peft vs alpaca-lora minimal-llama vs simple-llm-finetuner peft vs dalai minimal-llama vs alpaca-lora peft vs mlc-llm minimal-llama vs GPTQ-for-LLaMa peft vs minLoRA

Compare minimal-llama vs peft and see what are their differences.

minimal-llama

peft

minimal-llama

peft

What are some alternatives?