Dataset Creation Tools?

Our great sponsors

InfluxDB - Power Real-Time Data Analytics at Scale

WorkOS - The modern identity platform for B2B SaaS

SaaSHub - Software Alternatives and Reviews

Our great sponsors

cog_stanford_alpaca

1 0 4.0

Code and documentation to train Stanford's Alpaca models, and generate the data.

In general, for lora training I go with the alpca format described here. It was the first real training done on llama and the format's pretty widely supported as a result. There's a good chance anything that can train on data will support that format with little to no extra formatting needed.

axolotl

29 5,811 9.8 Python

Go ahead and axolotl questions

You can save that overall set into a json file and load it up as training data in whatever you're using. I'm using axolotl for it at the moment. Though a GUI based option is probably best for the first couple of tries until you get a feel for the options.

InfluxDB

www.influxdata.com sponsored

Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Show HN: An API for detecting NSFW images
1 project | news.ycombinator.com | 29 Apr 2024
FLaNK AI Weekly for 29 April 2024
44 projects | dev.to | 29 Apr 2024
Functional Semantics in Imperative Clothing (Richard Feldman)
1 project | news.ycombinator.com | 29 Apr 2024
CloudGoat
1 project | news.ycombinator.com | 29 Apr 2024
Memary is a cutting-edge long-term memory system based on a knowledge graph
2 projects | news.ycombinator.com | 29 Apr 2024

This page summarizes the projects mentioned and recommended in the original post on /r/LocalLLaMA Post date: 15 Oct 2023

cog_stanford_alpaca

axolotl

InfluxDB

Related posts