-
TinyLlama
The TinyLlama project is an open endeavor to pretrain a 1.1B Llama model on 3 trillion tokens.
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
Coincidentally, or not? A research team just started a 90 day effort to train a 1.1 billion parameter llama model on 3 trillion tokens.
https://github.com/jzhang38/TinyLlama
You might want to give a read to "Scaling Data-Constrained Language Models" [1]. They basically generalized the Chinchilla scaling law by investigating behavior on multi-epoch runs.
[1] https://arxiv.org/abs/2305.16264
NOTE:
The number of mentions on this list indicates mentions on common posts plus user suggested alternatives.
Hence, a higher number means a more popular project.