LLaMA-LoRA-Tuner vs AlpacaDataCleaned

LLaMA-LoRA-Tuner

UI tool for fine-tuning and testing your own LoRA models base on LLaMA, GPT-J and more. One-click run on Google Colab. + A Gradio ChatGPT-like Chat UI to demonstrate your language models. (by zetavg)

Source Code

Suggest alternative

Edit details

AlpacaDataCleaned

Alpaca dataset from Stanford, cleaned and curated (by gururise)

Suggest topics

Source Code

Suggest alternative

Edit details

InfluxDB - Power Real-Time Data Analytics at Scale

Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

www.influxdata.com

featured

SaaSHub - Software Alternatives and Reviews

SaaSHub helps you find the best software and product alternatives

www.saashub.com

featured

LLaMA-LoRA-Tuner		AlpacaDataCleaned
	Project
6	Mentions	14
426	Stars	1,394
-	Growth	-
7.9	Activity	7.6
12 months ago	Latest Commit	about 1 year ago
Python	Language	Python
-	License	Apache License 2.0

The number of mentions indicates the total number of mentions that we've tracked plus the number of user suggested alternatives.
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.

LLaMA-LoRA-Tuner

Posts with mentions or reviews of LLaMA-LoRA-Tuner. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2023-05-25.

[P] Uptraining a pretrained model using company data?
4 projects | /r/MachineLearning | 25 May 2023
(HELP) Token Issue on Generation
1 project | /r/LocalLLaMA | 19 May 2023
Help with Random Characters and Words on Output
1 project | /r/LocalLLaMA | 18 May 2023
Fine-tuning LLaMA for research without Meta license
1 project | /r/LocalLLaMA | 15 May 2023

I would like to fine-tune LLaMA using this tuner for a research paper, but I am wondering if it is legal to do so. If it isn't, does anyone have suggestions for alternatives which are similarly user-friendly as the one above, since I am not a good programmer? Any advice would be greatly appreciated, thank you!
Why run LLMs locally?
4 projects | /r/LocalLLaMA | 8 May 2023

The bad news is that, as far as I know, it does require a GPU. The good news is that I've gotten training done with a 7b model on both google colab and kaggle with free accounts. Both have 'just' enough vram to make it work as long as you use load the model in 8bit. Like --load-in-8bit on the command line with oobabooga. The Lora Tuner frontend even has a colab notebook set up to simplify things even more. Though the frontend keeps the LoRA Rank and LoRA Alpha values capped pretty low. Thankfully that's just set in the GUI though. I think it was one of the files in its UI directory. Pretty easy to just hand edit it to allow for higher values if desired.
How can I train my custom dataset on top of Vicuna?
6 projects | /r/LocalLLaMA | 19 Apr 2023

AlpacaDataCleaned

Posts with mentions or reviews of AlpacaDataCleaned. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2023-05-08.

While training LoRA I get 'Failed to read file... JSON parse error'
1 project | /r/Oobabooga | 2 Jun 2023

I tried using the default alpaca_data_cleaned.json training dataset as mentioned here: https://github.com/gururise/AlpacaDataCleaned/blob/main/alpaca_data_cleaned.json. Does anyone know why I could be getting this error? The file must be in correct format since it is the default file they have shown in their example.
Why run LLMs locally?
4 projects | /r/LocalLLaMA | 8 May 2023

This cleaned alpaca dataset gives a good idea of how data is formatted for the standard alpaca json format. Personally, I'd handle making your own datasets by using gpt4 to format the data into a dataset. You can do it by hand or use a llama model, but I've personally just found using chatgpt to be the most efficient way to get the highest possible output. I'm trying to go for quality over quantity.
New llama LoRA trained on WizardLM dataset
2 projects | /r/LocalLLaMA | 27 Apr 2023

I created a dataset merge based on the following very high quality datasets:
[P] Finetuning a commercially viable open source LLM (Flan-UL2) using Alpaca, Dolly15K and LoRA
4 projects | /r/MachineLearning | 20 Apr 2023
Stability AI Launches the First of Its StableLM Suite of Language Models
24 projects | news.ycombinator.com | 19 Apr 2023

That dataset is licensed under CC BY NC 4.0, which is not open. It also has a bunch of garbage in it; see https://github.com/gururise/AlpacaDataCleaned
Alpacino-13B
2 projects | /r/LocalLLM | 17 Apr 2023
GPT4-X-Alpaca 30B 4-bit, by MetaIX based on LoRA by chansung
4 projects | /r/LocalLLaMA | 14 Apr 2023

The alpaca cleaned dataset has integrated the Microsoft GPT-4 dataset and cleaned many of the issues.
Alpaca, LLaMa, Vicuna [D]
6 projects | /r/MachineLearning | 11 Apr 2023

13b Alpaca Cleaned (trained on the cleaned dataset) is very impressive and works well as an instruct model w/o any censorship.
Is there a good place to post datasets for the community?
2 projects | /r/LocalLLaMA | 3 Apr 2023

There's already a community maintained Alpaca with cleaned data. https://github.com/gururise/AlpacaDataCleaned And a huge amount of work has already been done.
Dirty data sets and LLaMA/ALPACA...
1 project | /r/LocalLLaMA | 28 Mar 2023

this might be what you're looking for: https://github.com/gururise/AlpacaDataCleaned

What are some alternatives?

When comparing LLaMA-LoRA-Tuner and AlpacaDataCleaned you can also consider the following projects:

CodeCapybara - Open-source Self-Instruction Tuning Code LLM

StableLM - StableLM: Stability AI Language Models

CodeCapypara - [Moved to: https://github.com/FSoft-AI4Code/CodeCapybara]

safetensors - Simple, safe way to store and distribute tensors

BELLE - BELLE: Be Everyone's Large Language model Engine（开源中文对话大模型）

koboldcpp - A simple one-file way to run various GGML and GGUF models with KoboldAI's UI

lora - Train Large Language Models (LLM) using LoRA

simpleAI - An easy way to host your own AI API and expose alternative models, while being compatible with "open" AI clients.

GPT-4-LLM - Instruction Tuning with GPT-4

simple-llm-finetuner - Simple UI for LLM Model Finetuning

txtinstruct - 📚 Datasets and models for instruction-tuning

LLaMA-LoRA-Tuner vs CodeCapybara AlpacaDataCleaned vs StableLM LLaMA-LoRA-Tuner vs CodeCapypara AlpacaDataCleaned vs safetensors LLaMA-LoRA-Tuner vs BELLE AlpacaDataCleaned vs koboldcpp LLaMA-LoRA-Tuner vs lora AlpacaDataCleaned vs simpleAI LLaMA-LoRA-Tuner vs koboldcpp AlpacaDataCleaned vs GPT-4-LLM LLaMA-LoRA-Tuner vs simple-llm-finetuner AlpacaDataCleaned vs txtinstruct

Compare LLaMA-LoRA-Tuner vs AlpacaDataCleaned and see what are their differences.

LLaMA-LoRA-Tuner

AlpacaDataCleaned

LLaMA-LoRA-Tuner

AlpacaDataCleaned

What are some alternatives?