-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
-
hivemind
Decentralized deep learning in PyTorch. Built to train models on thousands of volunteers across the world.
An interesting outcome of the nanoGPT repo is this struggle to exactly match the Chinchilla findings[0], even after discussing it with the authors.
A larger discussion is that the scaling laws achieve loss-optimal compute time, but the pre-training loss only improves predictions on the corpus, which contains texts written by people that were wrong or whose prose was lacking. In a real system, what you want to optimize for is accuracy, composability, inventiveness.
[0]: https://github.com/karpathy/nanoGPT/blob/master/scaling_laws...
To train small gpt-like models, there's also aitextgen: https://github.com/minimaxir/aitextgen
A100’s are Nvidia GPU’s. You can rent them from providers like AWS or LamdaLabs. The readme has instructions for downloading the original GPT2 weights from OpenAI. You can also train a very simple version on a smaller dataset from your laptop as described in the README.
If you just want to play with a similar but much better model goto https://chat.openai.com
While doing my PhD some years ago (it wasn't a PhD on AI, but a very related thing) I trained several models with the usual stack back then (pytorch and TF). I realized that a lot of this stack could be rewritten in much simpler terms without sacrificing much fidelity and/or performance.
Submissions like yours and other projects like this one -> https://github.com/ggerganov/whisper.cpp
makes it pretty clear to me (and others) that this intuition is correct.
There's a couple tools I created back then that could push things further towards this direction, unfortunately they're not mature enough to warrant a release but the ideas they portray are worth taking a look at (IMHO). If there's interest on your side (or anyone reading this thread) I'd love to talk more about it.
There absolutely are! Check out hivemind (https://github.com/learning-at-home/hivemind), a general library for deep learning over the Internet, or Petals (https://petals.ml/), a system that leverages Hivemind and allows you to run BLOOM-176B (or other large language models) that is distributed over many volunteer PCs. You can join it and host some layers of the model by running literally one command on a Linux machine with Docker and a recent enough GPU.
Disclaimer: I work on these projects, both are based on our research over the past three years