Our great sponsors
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
I haven't done so, but don't you sign an agreement when you ask Facebook for a link to download the weights for LLAMA which is currently the only officially supported way of getting those weights (https://github.com/facebookresearch/llama/tree/main#llama) ?
Thank you for developing the pipeline and amassing considerable compute for gathering and preprocessing this dataset!
I'm not sure if this is the right place to ask about this, but could you consider training an LLM using a more advanced, sparse transformer architecture (specifically, "Terraformer" from this paper https://arxiv.org/abs/2111.12763 and this codebase https://github.com/google/trax/blob/master/trax/models/resea... by Google Brain and OpenAI)? I understand the pressure to focus on training a straightforward LLaMA replication, but of course you see that it's a legacy dense architecture which limits its inference performance. This new architecture is not just an academic curiosity but is already validated at scale by Google, providing 10x+ inference performance boost on the same hardware.
Frankly, the community's compute budget - for training and for inference - isn't infinite, and neither is the public's interest in models that do not have advantage (at least in convenience) over closed-source ones; and so we should utilize both those resources as efficiently as possible. It could be a big step forward if you trained at least LLaMA-Terraformer-7B and 13B foundation models on the whole dataset.
Related posts
- The founder of Gmail claims that ChatGPT can “kill” Google in two years.
- [D] Paper Explained - Sparse is Enough in Scaling Transformers (aka Terraformer) | Video Walkthrough
- How to train large models on a normal laptop?
- How to stay up-to-date with the latest AI company announcements and events?
- Tips to become self taught machine learning engineer