swarm-jax

Swarm training framework using Haiku + JAX + Ray for layer parallel transformer language models on unreliable, heterogeneous nodes (by kingoflolz)

Swarm-jax Alternatives

Similar projects and alternatives to swarm-jax

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a better swarm-jax alternative or higher similarity.

swarm-jax reviews and mentions

Posts with mentions or reviews of swarm-jax. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2021-07-04.
  • Ray: A Distributed Framework for Emerging AI Applications
    3 projects | news.ycombinator.com | 4 Jul 2021
    I used Ray to train a massive GPT model by putting each layer on a separate TPU. Ray was able to send all the gradients back and forth as needed.

    It scaled fine up to 33 TPUs (i.e. 33 layers).

    Ray is impressive as hell.

    By the way, I didn't write the code to do any of that. kindiana, aka "the guy that wrote GPT-J", also happened to write this: https://github.com/kingoflolz/swarm-jax/tree/master/swarm_ja...

    I just ran it and it worked. Which is extraordinarily unusual for TPUs, historically speaking.

  • GPT-J “the open source cousin of GPT-3 everyone can use”
    9 projects | news.ycombinator.com | 3 Jul 2021
    Believe it or not, it's completely free.

    It's thanks to TFRC. It's the most world-changing program I know of. It's why I go door to door like the proverbial religious fanatic, singing TFRC's praises, whether people want to listen or not.

    Because for the first time in history, any capable ML hacker now has the resources they need to do something like this.

    Imagine it. This is a legit OpenAI-style model inference API. It's now survived two HN front page floods.

    (I saw it go down about an hour ago, so I was like "Nooo! Prove you're production grade! I believe in you!" and I think my anime-style energy must've brought it back up, since the API works fine now. Yep, it was all me. Keyboard goes clackclackclack, world changes, what can I say? Just another day at the ML office oh god this joke has gone on for like centuries too long.)

    And it's all thanks to TFRC. I'm intentionally not linking anything about TFRC, because in typical google fashion, every single thing you can find online is the most corporate, soulless-looking "We try to help you do research at scale" generic boilerplate imaginable.

    So I decided to write something about TFRC that wasn't: https://blog.gpt4.org/jaxtpu

    (It was pretty hard to write a medieval fantasy-style TPU fanfic, but someone had to. Well, maybe no one had to. But I just couldn't let such a wonderful project go unnoticed, so I had to try as much stupid shit as possible to get the entire world to notice how goddamn cool it is.)

    To put things into perspective, a TPU v2-8 is the "worst possible TPU you could get access to."

    They give you access to 100.

    On day one.

    This is what originally hooked me in. My face, that first day in 2019 when TFRC's email showed up saying "You can use 100 v2-8's in us-central1-f!": https://i.imgur.com/EznLvlb.png

    The idea of using 100 theoretically high-performance nodes of anything, in creative ways, greatly appealed to my gamedev background.

    It wasn't till later that I discovered, to my delight, that these weren't "nodes of anything."

    These are 96 CPU, 330GB RAM, Ubuntu servers.

    That blog post I just linked to is running off of a TPU right now. Because it's literally just an ubuntu server.

    This is like the world's best kept secret. It's so fucking incredible that I have no idea why people aren't beating down the doors, using every TPU that they can get their hands on, for as many harebrained ideas as possible.

    God, I can't even list how much cool shit there is to discover. You'll find out that you get 100Gbit/s between two separate TPUs. In fact, I'm pretty sure it's even higher than this. That means you don't even need a TPU pod anymore.

    At least, theoretically. I tried getting Tensorflow to do this, for over a year.

    kindiana (Ben Wang), the guy who wrote this GPT-J codebase we're all talking about, casually proved that this was not merely theoretical: https://twitter.com/theshawwn/status/1406171487988498433

    He tried to show me https://github.com/kingoflolz/swarm-jax/ once, long ago. I didn't understand at the time what I was looking at, or why it was such a big deal. But basically, when you put each GPT layer on a separate TPU, it means you can string together as many TPUs as you want, to make however large of a model you want.

    You should be immediately skeptical of that claim. It shouldn't be obvious that the bandwidth is high enough to train a GPT-3 sized model in any reasonable time frame. It's still not obvious to me. But at this point, I've been amazed by so many things related to TPUs, JAX, and TFRC, that I feel like I'm dancing around in willy wonka's factory while the door's wide open. The oompa loompas are singing about "that's just what the world will do, oompa-loompa they'll ignore you" while I keep trying to get everybody to stop what they're doing and step into the factory.

    The more people using TPUs, the more google is going to build TPUs. They can fill three small countries entirely with buildings devoted to TPUs. The more people want these things, the more we'll all have.

    Because I think Google's gonna utterly annihilate Facebook in ML mindshare wars: https://blog.gpt4.org/mlmind

    TPU VMs just launched a month ago. No one realizes yet that JAX is the React of ML.

    Facebook left themselves wide open by betting on GPUs. GPUs fucking suck at large-scale ML training. Why the hell would you pay $1M when you can get the same thing for orders of magnitude less?

    And no one's noticed that TPUs don't suck anymore. Forget everything you've ever heard about them. JAX on TPU VMs changes everything. In five years, you'll all look like you've been writing websites in assembly.

    But hey, I'm just a fanatic TPU zealot. It's better to just write me off and keep betting on that reliable GPU pipeline. After all, everyone has millions of VC dollars to pour into the cloud furnace, right?

    TFRC changed my life. I tried to do some "research" https://www.docdroid.net/faDq8Bu/swarm-training-v01a-pdf back when Tensorfow's horrible problems were your only option on TPUs.

    Nowadays, you can think of JAX as "approximately every single thing you could possibly hope for."

    GPT-J is proof. What more can I say? No TFRC, no GPT-J.

    The world is nuts for not noticing how impactful TFRC has been. Especially TFRC support. Jonathan (from the support team) is just ... such a wonderful person. I was blown away at how much he cares about taking care of new TFRC members.

Stats

Basic swarm-jax repo stats
2
229
0.0
12 months ago

Sponsored
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com