GPT-J “the open source cousin of GPT-3 everyone can use”

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com

InfluxDB - Power Real-Time Data Analytics at Scale
Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
www.influxdata.com
featured
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com
featured
  • mesh-transformer-jax

    Model parallel transformers in JAX and Haiku

  • I hadn't noticed this format before, but wouldn't one want to include a sha or git tag in such a citation?

          howpublished = {\url{https://github.com/kingoflolz/mesh-transformer-jax}},

  • website

    The code that runs my blog: https://blog.gpt4.org/ (by shawwn)

  • Your view here is entirely reasonable. It was my view before I ever heard about TFRC. I was every bit as skeptical.

    That view is wrong. From https://github.com/shawwn/website/blob/master/jaxtpu.md :

    > So we're talking about a group of people who are the polar opposite of any Google support experience you may have had.

    > Ever struggle with GCP support? They took two weeks to resolve my problem. During the whole process, I vividly remember feeling like, "They don't quite seem to understand what I'm saying... I'm not sure whether to be worried."

    > Ever experience TFRC support? I've been a member for almost two years. I just counted how many times they failed to come through for me: zero times. And as far as I can remember, it took less than 48 hours to resolve whatever issue I was facing.

    > For a Google project, this was somewhere between "space aliens" and "narnia" on the Scale of Surprising Things.

    [...]

    > My goal here is to finally put to rest this feeling that everyone has. There's some kind of reluctance to apply to TFRC. People always end up asking stuff like this:

    > "I'm just a university student, not an established researcher. Should I apply?"

    > Yes!

    > "I'm just here to play around a bit with TPUs. I don't have any idea what I'm doing, but I'll poke around a bit and see what's up. Should I apply?"

    > Heck yeah!

    > "I have a Serious Research Project in mind. I'd like to evaluate whether the Cloud TPU VM platform is sufficient for our team's research goals. Should I apply?"

    > Absolutely. But whoever you are, you've probably applied by now. Because everyone is realizing that TFRC is how you accomplish your research goals.

    I expect that if you apply, you'll get your activation email within a few hours. Of course, you better get in quick. My goal here was to cause a stampede. Right now, in my experience, you'll be up and running by tomorrow. But if ten thousand people show up from HN, I don't know if that will remain true. :)

    I feel a bit bad to be talking at length at TFRC. But then I remembered that none of this is off-topic in the slightest. GPT-J was proof of everything above. No TFRC, no GPT-J. The whole reason that the world can enjoy GPT-J now is because anyone can show up and start doing as many effective things as you can possibly learn.

    It was all thanks to TFRC, the Cloud TPU team, the JAX team, the XLA compiler team -- hundreds of people, who have all managed to gift us this amazing opportunity. Yes, they want to win the ML mindshare war. But they know the way to win it is to care deeply about helping you achieve every one of your research goals.

    Think of it like a side hobby. Best part is, it's free. (Just watch out for the egress bandwidth, ha. Otherwise you'll be talking with GCP support for your $500 refund -- and yes, that's an unpleasant experience.)

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • gpt-neo

    Discontinued An implementation of model parallel GPT-2 and GPT-3-style models using the mesh-tensorflow library.

  • They super earned it. From day one, everyone showed up with a level of drive and determination I haven't seen elsewhere.

    My name is on The Pile paper https://arxiv.org/abs/2101.00027 but I didn't do anything except make the books3 dataset. Stella, Leo, and everyone else did the hard work. You know, the work that's "actually useful to the scientific community." I didn't even help them hunt for typos, even though Stella asked me to. I was just like, sorry, no time, I have to focus on my own research.

    Imagine saying "nah" to helping shape one of the most important open source AI research projects of the coming years. Training data quality is becoming more and more of a focus.

    Lemme tell you a quick story.

    When https://venturebeat.com/2021/06/09/eleutherai-claims-new-nlp... come out, this quote caught my eye:

    > But EleutherAI claims to have performed “extensive bias analysis” on The Pile and made “tough editorial decisions” to exclude datasets they felt were “unacceptably negatively biased” toward certain groups or views.

    When I read this, I felt astonished that Eleuther was yet again trying to pose as the cool super-progressive AI lab. To my knowledge, no such thing ever happened. And I was involved with The Pile back when it was just me and Leo memeing in Discord DMs about how the world needed some quality training data once and for all.

    I went to Stella in DMs (you should follow her too! https://twitter.com/BlancheMinerva/status/139408950872390042...) and was like, what the hell? I don't understand how this could possibly be true. What are these supposed "tough editorial decisions"?

    Stella calmly explained to me that the US Congressional Record had been considered and rejected for inclusion in The Pile. I thought "Big deal, who the hell cares?" while saying "Okay, but I don't know what that is."

    It’s a written record of all statements made in the US legislature. It was also somewhere between 1GB and 15GB, which would have been a significant portion of The Pile's total size.

    I'm going to quote from her private DMs with me, which I haven't asked for permission to do. So this is technically another bad move by me. But she put it so perfectly, I was stunned:

    > For half the history of the US, black people were slaves. For something like 75% of it, black people didn’t have the right to vote. A modern reader didn’t think there wasn’t a high proportion of extremely racist content, that would primarily be an inditement of modern people lol.

    > The reason we first looked at it was that we included a similar document for the EU Parlement

    It took me a few minutes to come to my senses, but I finally realized:

    (a) this dataset likely contained a huge proportion of content that, politics aside, would be a Very Bad Idea to include in your ML models by default;

    (b) Eleuther had just been trying to do good work this whole time

    So you know, when you're in that situation, you can choose to either keep believing your own false ideas, or you can pay attention to empirical evidence and change your behavior. And empirically, I had been a massive asshole to everyone since pretty much the beginning. The only thing I helped with was books3 and arranging The Eye to get them some reliable hosting. (Shoutout to The Eye, by the way. Help 'em out if you can: https://the-eye.eu/public/AI/)

    And there's my name, right there on the paper.

    It's even worse than I described. I put the paper in jeopardy, because they were submitting it to a conference with strict anonymity rules. I had no idea about it (no one told me). I ended up so happy to see my name on a real arxiv paper that I tweeted out some self-congratulatory bullshit, and quote-tweeted something linking to The Pile. It was a few days into the anonymity period, but nonetheless, it was a violation of the anonymity rules. A lot of people saw that tweet, and the whole point of the rules is to ensure that people don't get unfair advantages by advertising on social media.

    When they came to me in DMs apologizing profusely for not talking with me about it, and asking me to delete the tweet, I basically told them to go shove a spoon up their.... because I didn't agree to any rules, and the idea that The Pile should go radio silent for five months on social media struck me as completely crazy.

    In hindsight, I was... just awful. So I mean, me posting this is like, the absolute minimum I can do. They've been the ones working for like a year to make all of this happen. Ended up feeling like a fraud, since everyone thinks highly of my ML work, and here I'd been nothing but problematic for a group of people who are just trying to ship good scientific work.

    Fast forward to today, and the results are clear. Go help Eleuther: https://www.eleuther.ai/ They're cool, and you'll get a shot at changing the world. I'm not sure you even have to be particularly skilled; some of the most valuable work was done by people who just showed up and started doing things, e.g. making the website look a little nicer, or making a cool logo.

  • helpmecode

    Augmented Intelligence Programming

  • I made a none-gpt variant of this idea for fun after watching some breathless microsoft video about AI in VS code, which amounted to autocomplete sorted by popularity.

    It is just a wrapper to howdoi with a sprinkle of extra logic to make it a bit more useful.

    https://github.com/irthomasthomas/helpmecode

    Rather than creating fake code that looks real, this gives you the S.O. code answers verbatim. So you know it is probably out of date or wrong. At least you cant delude yourself (if you don't know how to write a piece of code then how can you judge the code written by GPT?).

    My tool is great for getting a quick reminder of syntax without needing to switch to a browser. I could improve it by adding metadata like the date of the answer and comments maybe.

  • swarm-jax

    Swarm training framework using Haiku + JAX + Ray for layer parallel transformer language models on unreliable, heterogeneous nodes

  • Believe it or not, it's completely free.

    It's thanks to TFRC. It's the most world-changing program I know of. It's why I go door to door like the proverbial religious fanatic, singing TFRC's praises, whether people want to listen or not.

    Because for the first time in history, any capable ML hacker now has the resources they need to do something like this.

    Imagine it. This is a legit OpenAI-style model inference API. It's now survived two HN front page floods.

    (I saw it go down about an hour ago, so I was like "Nooo! Prove you're production grade! I believe in you!" and I think my anime-style energy must've brought it back up, since the API works fine now. Yep, it was all me. Keyboard goes clackclackclack, world changes, what can I say? Just another day at the ML office oh god this joke has gone on for like centuries too long.)

    And it's all thanks to TFRC. I'm intentionally not linking anything about TFRC, because in typical google fashion, every single thing you can find online is the most corporate, soulless-looking "We try to help you do research at scale" generic boilerplate imaginable.

    So I decided to write something about TFRC that wasn't: https://blog.gpt4.org/jaxtpu

    (It was pretty hard to write a medieval fantasy-style TPU fanfic, but someone had to. Well, maybe no one had to. But I just couldn't let such a wonderful project go unnoticed, so I had to try as much stupid shit as possible to get the entire world to notice how goddamn cool it is.)

    To put things into perspective, a TPU v2-8 is the "worst possible TPU you could get access to."

    They give you access to 100.

    On day one.

    This is what originally hooked me in. My face, that first day in 2019 when TFRC's email showed up saying "You can use 100 v2-8's in us-central1-f!": https://i.imgur.com/EznLvlb.png

    The idea of using 100 theoretically high-performance nodes of anything, in creative ways, greatly appealed to my gamedev background.

    It wasn't till later that I discovered, to my delight, that these weren't "nodes of anything."

    These are 96 CPU, 330GB RAM, Ubuntu servers.

    That blog post I just linked to is running off of a TPU right now. Because it's literally just an ubuntu server.

    This is like the world's best kept secret. It's so fucking incredible that I have no idea why people aren't beating down the doors, using every TPU that they can get their hands on, for as many harebrained ideas as possible.

    God, I can't even list how much cool shit there is to discover. You'll find out that you get 100Gbit/s between two separate TPUs. In fact, I'm pretty sure it's even higher than this. That means you don't even need a TPU pod anymore.

    At least, theoretically. I tried getting Tensorflow to do this, for over a year.

    kindiana (Ben Wang), the guy who wrote this GPT-J codebase we're all talking about, casually proved that this was not merely theoretical: https://twitter.com/theshawwn/status/1406171487988498433

    He tried to show me https://github.com/kingoflolz/swarm-jax/ once, long ago. I didn't understand at the time what I was looking at, or why it was such a big deal. But basically, when you put each GPT layer on a separate TPU, it means you can string together as many TPUs as you want, to make however large of a model you want.

    You should be immediately skeptical of that claim. It shouldn't be obvious that the bandwidth is high enough to train a GPT-3 sized model in any reasonable time frame. It's still not obvious to me. But at this point, I've been amazed by so many things related to TPUs, JAX, and TFRC, that I feel like I'm dancing around in willy wonka's factory while the door's wide open. The oompa loompas are singing about "that's just what the world will do, oompa-loompa they'll ignore you" while I keep trying to get everybody to stop what they're doing and step into the factory.

    The more people using TPUs, the more google is going to build TPUs. They can fill three small countries entirely with buildings devoted to TPUs. The more people want these things, the more we'll all have.

    Because I think Google's gonna utterly annihilate Facebook in ML mindshare wars: https://blog.gpt4.org/mlmind

    TPU VMs just launched a month ago. No one realizes yet that JAX is the React of ML.

    Facebook left themselves wide open by betting on GPUs. GPUs fucking suck at large-scale ML training. Why the hell would you pay $1M when you can get the same thing for orders of magnitude less?

    And no one's noticed that TPUs don't suck anymore. Forget everything you've ever heard about them. JAX on TPU VMs changes everything. In five years, you'll all look like you've been writing websites in assembly.

    But hey, I'm just a fanatic TPU zealot. It's better to just write me off and keep betting on that reliable GPU pipeline. After all, everyone has millions of VC dollars to pour into the cloud furnace, right?

    TFRC changed my life. I tried to do some "research" https://www.docdroid.net/faDq8Bu/swarm-training-v01a-pdf back when Tensorfow's horrible problems were your only option on TPUs.

    Nowadays, you can think of JAX as "approximately every single thing you could possibly hope for."

    GPT-J is proof. What more can I say? No TFRC, no GPT-J.

    The world is nuts for not noticing how impactful TFRC has been. Especially TFRC support. Jonathan (from the support team) is just ... such a wonderful person. I was blown away at how much he cares about taking care of new TFRC members.

  • tensorflow

    An Open Source Machine Learning Framework for Everyone

  • Good ol' x86_64.

    Now, here's the crazy part. Until one month ago, it was impossible for us to SSH into TPUs, let alone use the accelerators for anything. That means nobody yet has had any time to integrate TPU accelerators into their products.

    What I mean is -- you're absolutely correct, my blog is merely running on "an Ubuntu server," whereas I was claiming that it's being "powered by a TPU." It's not using any of the TPU accelerators for anything at all (at least, not for the blog).

    But it's easy to imagine a future where, once people realize how effortless it is to use jax to do some heavy lifting, people are going to start adding jax acceleration all over the place.

    It feels like a matter of time till one day, you'll run `sudo apt-get install npm` on your TPU, and then it'll turn out that the latest nodejs is being accelerated by the MXU cores. Because that's a thing you can do now. One of the big value-adds here is "libtpu" -- it's a C library that gives you low-level access to the MXU cores that are attached directly to your gigantic Ubuntu server (aka "your TPU".)

    Here, check this out: https://github.com/tensorflow/tensorflow/blob/master/tensorf...

    Wanna see a magic trick? That's a single, self-contained C file. I was shocked that the instructions to run this were in the comments at the top:

      // To compile: gcc -o libtpu_client libtpu_client.c -ldl

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts

  • Side Quest Devblog #1: These Fakes are getting Deep

    3 projects | dev.to | 29 Apr 2024
  • Google lays off its Python team

    3 projects | news.ycombinator.com | 27 Apr 2024
  • My Favorite DevTools to Build AI/ML Applications!

    9 projects | dev.to | 23 Apr 2024
  • TensorFlow-metal on Apple Mac is junk for training

    1 project | news.ycombinator.com | 16 Jan 2024
  • Show HN: Designing Bridges with PyTorch

    4 projects | news.ycombinator.com | 11 Jan 2024