RedPajama: Reproduction of Llama with Friendly License

Our great sponsors

InfluxDB - Power Real-Time Data Analytics at Scale

WorkOS - The modern identity platform for B2B SaaS

SaaSHub - Software Alternatives and Reviews

Our great sponsors

llama

184 53,053 8.1 Python

Inference code for Llama models

I haven't done so, but don't you sign an agreement when you ask Facebook for a link to download the weights for LLAMA which is currently the only officially supported way of getting those weights (https://github.com/facebookresearch/llama/tree/main#llama) ?

opencyc

2 7 10.0 Dockerfile
InfluxDB

www.influxdata.com sponsored

Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
opencyc

4 210 10.0
trax

7 7,957 4.7 Python

Trax — Deep Learning with Clear Code and Speed

Thank you for developing the pipeline and amassing considerable compute for gathering and preprocessing this dataset!
I'm not sure if this is the right place to ask about this, but could you consider training an LLM using a more advanced, sparse transformer architecture (specifically, "Terraformer" from this paper https://arxiv.org/abs/2111.12763 and this codebase https://github.com/google/trax/blob/master/trax/models/resea... by Google Brain and OpenAI)? I understand the pressure to focus on training a straightforward LLaMA replication, but of course you see that it's a legacy dense architecture which limits its inference performance. This new architecture is not just an academic curiosity but is already validated at scale by Google, providing 10x+ inference performance boost on the same hardware.
Frankly, the community's compute budget - for training and for inference - isn't infinite, and neither is the public's interest in models that do not have advantage (at least in convenience) over closed-source ones; and so we should utilize both those resources as efficiently as possible. It could be a big step forward if you trained at least LLaMA-Terraformer-7B and 13B foundation models on the whole dataset.

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project