Our great sponsors
-
open_llama
OpenLLaMA, a permissively licensed open source reproduction of Meta AI’s LLaMA 7B trained on the RedPajama dataset
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
-
RedPajama-Data
The RedPajama-Data repository contains code for preparing large datasets for training large language models.
Yes? Their github is under Apache, their base model is under apache, the training data is not theirs, and they provide scripts how to convert it for the pretrain step. They have scripts for pretraining and finetuning as well. Basically for everything.
Compare it to openllama. It github doesn't have a single script on how to do anything.
Compare it to red pajama, which has scripts only for preprocessing.