Get Scout setup in minutes, and let us sweat the small stuff. A couple lines in settings.py is all you need to start monitoring your apps. Sign up for our free tier today. Learn more →
AlpacaDataCleaned Alternatives
Similar projects and alternatives to AlpacaDataCleaned
-
Scout Monitoring
Free Django app performance insights with Scout Monitoring. Get Scout setup in minutes, and let us sweat the small stuff. A couple lines in settings.py is all you need to start monitoring your apps. Sign up for our free tier today.
-
-
-
dolly
Databricks’ Dolly, a large language model trained on the Databricks Machine Learning Platform
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
-
-
ue5-llama-lora
A proof-of-concept project that showcases the potential for using small, locally trainable LLMs to create next-generation documentation tools.
-
-
simpleAI
An easy way to host your own AI API and expose alternative models, while being compatible with "open" AI clients.
-
sparsegpt
Code for the ICML 2023 paper "SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot".
-
LLaMA-LoRA-Tuner
UI tool for fine-tuning and testing your own LoRA models base on LLaMA, GPT-J and more. One-click run on Google Colab. + A Gradio ChatGPT-like Chat UI to demonstrate your language models.
-
-
instruct-eval
This repository contains code to quantitatively evaluate instruction-tuned models such as Alpaca and Flan-T5 on held-out tasks.
-
GPTeacher
A collection of modular datasets generated by GPT-4, General-Instruct - Roleplay-Instruct - Code-Instruct - and Toolformer
-
geov
The GeoV model is a large langauge model designed by Georges Harik and uses Rotary Positional Embeddings with Relative distances (RoPER). We have shared a pre-trained 9B parameter model.
-
Open-Instructions
Open-Instructions: A Pavilion of recent Open Source GPT Projects for decentralized AI.
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
AlpacaDataCleaned discussion
AlpacaDataCleaned reviews and mentions
-
While training LoRA I get 'Failed to read file... JSON parse error'
I tried using the default alpaca_data_cleaned.json training dataset as mentioned here: https://github.com/gururise/AlpacaDataCleaned/blob/main/alpaca_data_cleaned.json. Does anyone know why I could be getting this error? The file must be in correct format since it is the default file they have shown in their example.
-
Why run LLMs locally?
This cleaned alpaca dataset gives a good idea of how data is formatted for the standard alpaca json format. Personally, I'd handle making your own datasets by using gpt4 to format the data into a dataset. You can do it by hand or use a llama model, but I've personally just found using chatgpt to be the most efficient way to get the highest possible output. I'm trying to go for quality over quantity.
-
New llama LoRA trained on WizardLM dataset
I created a dataset merge based on the following very high quality datasets:
- [P] Finetuning a commercially viable open source LLM (Flan-UL2) using Alpaca, Dolly15K and LoRA
-
Stability AI Launches the First of Its StableLM Suite of Language Models
That dataset is licensed under CC BY NC 4.0, which is not open. It also has a bunch of garbage in it; see https://github.com/gururise/AlpacaDataCleaned
- Alpacino-13B
-
GPT4-X-Alpaca 30B 4-bit, by MetaIX based on LoRA by chansung
The alpaca cleaned dataset has integrated the Microsoft GPT-4 dataset and cleaned many of the issues.
-
Alpaca, LLaMa, Vicuna [D]
13b Alpaca Cleaned (trained on the cleaned dataset) is very impressive and works well as an instruct model w/o any censorship.
-
Is there a good place to post datasets for the community?
There's already a community maintained Alpaca with cleaned data. https://github.com/gururise/AlpacaDataCleaned And a huge amount of work has already been done.
-
Dirty data sets and LLaMA/ALPACA...
this might be what you're looking for: https://github.com/gururise/AlpacaDataCleaned
-
A note from our sponsor - Scout Monitoring
www.scoutapm.com | 14 Jun 2024
Stats
gururise/AlpacaDataCleaned is an open source project licensed under Apache License 2.0 which is an OSI approved license.
The primary programming language of AlpacaDataCleaned is Python.
Popular Comparisons
- AlpacaDataCleaned VS StableLM
- AlpacaDataCleaned VS safetensors
- AlpacaDataCleaned VS koboldcpp
- AlpacaDataCleaned VS simpleAI
- AlpacaDataCleaned VS GPT-4-LLM
- AlpacaDataCleaned VS txtinstruct
- AlpacaDataCleaned VS ue5-llama-lora
- AlpacaDataCleaned VS LLaMA-LoRA-Tuner
- AlpacaDataCleaned VS instruct-eval
- AlpacaDataCleaned VS geov