-
TinyLlama
The TinyLlama project is an open endeavor to pretrain a 1.1B Llama model on 3 trillion tokens.
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
> While LLM projects typically require an exorbitant amount of resources, it is important to remind ourselves that research does not need to assemble full-fledged massively expensive systems in order to have impact.
Check out TinyLlama; https://github.com/jzhang38/TinyLlama
Four research students from Singapore University of Technology and Design are pretraining a 1.1B Llama model on 3 trillion token using a handful of A100's.
They're also providing the source code, training data, and fine-tuned checkpoints for anyone to run.
Try this: https://github.com/refuel-ai/autolabel
Then the main challenge just becomes prompt design, which can sometimes be nebulous for NLP annotation.