swarm-jax
gpt-neo
swarm-jax | gpt-neo | |
---|---|---|
2 | 82 | |
229 | 6,158 | |
- | - | |
0.0 | 7.3 | |
12 months ago | about 2 years ago | |
Python | Python | |
- | MIT License |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
swarm-jax
-
Ray: A Distributed Framework for Emerging AI Applications
I used Ray to train a massive GPT model by putting each layer on a separate TPU. Ray was able to send all the gradients back and forth as needed.
It scaled fine up to 33 TPUs (i.e. 33 layers).
Ray is impressive as hell.
By the way, I didn't write the code to do any of that. kindiana, aka "the guy that wrote GPT-J", also happened to write this: https://github.com/kingoflolz/swarm-jax/tree/master/swarm_ja...
I just ran it and it worked. Which is extraordinarily unusual for TPUs, historically speaking.
-
GPT-J “the open source cousin of GPT-3 everyone can use”
Believe it or not, it's completely free.
It's thanks to TFRC. It's the most world-changing program I know of. It's why I go door to door like the proverbial religious fanatic, singing TFRC's praises, whether people want to listen or not.
Because for the first time in history, any capable ML hacker now has the resources they need to do something like this.
Imagine it. This is a legit OpenAI-style model inference API. It's now survived two HN front page floods.
(I saw it go down about an hour ago, so I was like "Nooo! Prove you're production grade! I believe in you!" and I think my anime-style energy must've brought it back up, since the API works fine now. Yep, it was all me. Keyboard goes clackclackclack, world changes, what can I say? Just another day at the ML office oh god this joke has gone on for like centuries too long.)
And it's all thanks to TFRC. I'm intentionally not linking anything about TFRC, because in typical google fashion, every single thing you can find online is the most corporate, soulless-looking "We try to help you do research at scale" generic boilerplate imaginable.
So I decided to write something about TFRC that wasn't: https://blog.gpt4.org/jaxtpu
(It was pretty hard to write a medieval fantasy-style TPU fanfic, but someone had to. Well, maybe no one had to. But I just couldn't let such a wonderful project go unnoticed, so I had to try as much stupid shit as possible to get the entire world to notice how goddamn cool it is.)
To put things into perspective, a TPU v2-8 is the "worst possible TPU you could get access to."
They give you access to 100.
On day one.
This is what originally hooked me in. My face, that first day in 2019 when TFRC's email showed up saying "You can use 100 v2-8's in us-central1-f!": https://i.imgur.com/EznLvlb.png
The idea of using 100 theoretically high-performance nodes of anything, in creative ways, greatly appealed to my gamedev background.
It wasn't till later that I discovered, to my delight, that these weren't "nodes of anything."
These are 96 CPU, 330GB RAM, Ubuntu servers.
That blog post I just linked to is running off of a TPU right now. Because it's literally just an ubuntu server.
This is like the world's best kept secret. It's so fucking incredible that I have no idea why people aren't beating down the doors, using every TPU that they can get their hands on, for as many harebrained ideas as possible.
God, I can't even list how much cool shit there is to discover. You'll find out that you get 100Gbit/s between two separate TPUs. In fact, I'm pretty sure it's even higher than this. That means you don't even need a TPU pod anymore.
At least, theoretically. I tried getting Tensorflow to do this, for over a year.
kindiana (Ben Wang), the guy who wrote this GPT-J codebase we're all talking about, casually proved that this was not merely theoretical: https://twitter.com/theshawwn/status/1406171487988498433
He tried to show me https://github.com/kingoflolz/swarm-jax/ once, long ago. I didn't understand at the time what I was looking at, or why it was such a big deal. But basically, when you put each GPT layer on a separate TPU, it means you can string together as many TPUs as you want, to make however large of a model you want.
You should be immediately skeptical of that claim. It shouldn't be obvious that the bandwidth is high enough to train a GPT-3 sized model in any reasonable time frame. It's still not obvious to me. But at this point, I've been amazed by so many things related to TPUs, JAX, and TFRC, that I feel like I'm dancing around in willy wonka's factory while the door's wide open. The oompa loompas are singing about "that's just what the world will do, oompa-loompa they'll ignore you" while I keep trying to get everybody to stop what they're doing and step into the factory.
The more people using TPUs, the more google is going to build TPUs. They can fill three small countries entirely with buildings devoted to TPUs. The more people want these things, the more we'll all have.
Because I think Google's gonna utterly annihilate Facebook in ML mindshare wars: https://blog.gpt4.org/mlmind
TPU VMs just launched a month ago. No one realizes yet that JAX is the React of ML.
Facebook left themselves wide open by betting on GPUs. GPUs fucking suck at large-scale ML training. Why the hell would you pay $1M when you can get the same thing for orders of magnitude less?
And no one's noticed that TPUs don't suck anymore. Forget everything you've ever heard about them. JAX on TPU VMs changes everything. In five years, you'll all look like you've been writing websites in assembly.
But hey, I'm just a fanatic TPU zealot. It's better to just write me off and keep betting on that reliable GPU pipeline. After all, everyone has millions of VC dollars to pour into the cloud furnace, right?
TFRC changed my life. I tried to do some "research" https://www.docdroid.net/faDq8Bu/swarm-training-v01a-pdf back when Tensorfow's horrible problems were your only option on TPUs.
Nowadays, you can think of JAX as "approximately every single thing you could possibly hope for."
GPT-J is proof. What more can I say? No TFRC, no GPT-J.
The world is nuts for not noticing how impactful TFRC has been. Especially TFRC support. Jonathan (from the support team) is just ... such a wonderful person. I was blown away at how much he cares about taking care of new TFRC members.
gpt-neo
-
How Open is Generative AI? Part 2
By December 2020, EleutherAI had introduced The Pile, a comprehensive text dataset designed for training models. Subsequently, tech giants such as Microsoft, Meta, and Google used this dataset for training their models. In March 2021, they revealed GPT-Neo, an open-source model under Apache 2.0 license, which was unmatched in size at its launch. EleutherAI’s later projects include the release of GPT-J, a 6 billion parameter model, and GPT-NeoX, a 20 billion parameter model, unveiled in February 2022. Their work demonstrates the viability of high-quality open-source AI models.
-
Creating an open source chat bot like ChatGPT for my own dataset without GPU?
Yeah, if that is your requirement you should definitely ignore chatterbot, as its older and probably not what your teacher wants. I'm looking at the gpt-neo docs right now: https://github.com/EleutherAI/gpt-neo
-
Any real competitor to GPT-3 which is open source and downloadable?
3.) EleutherAI's GPT-Neo and GPT-NeoX: EleutherAI is an independent research organization that aims to promote open research in artificial intelligence. They have released GPT-Neo, an open-source language model based on the GPT architecture, and are developing GPT-NeoX, a highly-scalable GPT-like model. You can find more information on their GitHub repositories: GPT-Neo: https://github.com/EleutherAI/gpt-neo GPT-NeoX: https://github.com/EleutherAI/gpt-neox
-
⚡ Neural - AI Code Generation for Vim
This is one of the first comprehensive plugins that has been rewritten to support multiple AI backends such as OpenAI GPT3+ and other custom sources in the future such as ChatGPT, GPT-J, GPT-neo and more.
-
Looks like some Taliban fighters are getting burnt out working the 9-5 grind
GPT-Neo is newer than GPT-2 on the open source side of things. In my experience, it tends to give longer and more creative responses than GPT-2 but not on the level of GPT-3. I've not tried GPT-J or GPT-NeoX, but they're also open source and reportedly better than GPT-Neo (albeit less accessible).
- H3 - a new generative language models that outperforms GPT-Neo-2.7B with only *2* attention layers! In H3, the researchers replace attention with a new layer based on state space models (SSMs). With the right modifications, they find that it can outperform transformers.
- First Open Source Alternative to ChatGPT Has Arrived
-
Where is the line for AI and where does ChatGPT stand?
Finally, yes-- it is trained via masked language modeling (text prediction). The approach has been fairly standard for years- the big difference with the GPT* models is the number of paramaters and volume of text-- we still haven't reached a ceiling with LLM parameters- they appear to keep improving with size. This training allows the model to learn a strong representation of language. Their training approach is published and open-source GPT* versions have already been made and released (https://github.com/EleutherAI/gpt-neo). However, the models are huge and can't be run locally for hobbyists. This gets at larger issues in democratization of ML.
- Using the GPT-3 AI Writer inside Obsidian(This is COOL)
-
Teaser trailer for "The Diary of Sisyphus" (2023), the world's first feature film written by an artificial intelligence (GPT-NEO) and produced Briefcase Films, my indie film studio based in Northern Italy
- GPT-Neo 2.7B, released Mar/2021, and unmaintained/unsupported as of Aug/2021? or;
What are some alternatives?
mesh-transformer-jax - Model parallel transformers in JAX and Haiku
gpt-neox - An implementation of model parallel autoregressive transformers on GPUs, based on the DeepSpeed library.