website
mesh-transformer-jax
website | mesh-transformer-jax | |
---|---|---|
3 | 52 | |
7 | 6,213 | |
- | - | |
0.0 | 0.0 | |
over 2 years ago | over 1 year ago | |
CSS | Python | |
- | Apache License 2.0 |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
website
- How do I get started with Jax on TPU VMs
-
GPT-J “the open source cousin of GPT-3 everyone can use”
Your view here is entirely reasonable. It was my view before I ever heard about TFRC. I was every bit as skeptical.
That view is wrong. From https://github.com/shawwn/website/blob/master/jaxtpu.md :
> So we're talking about a group of people who are the polar opposite of any Google support experience you may have had.
> Ever struggle with GCP support? They took two weeks to resolve my problem. During the whole process, I vividly remember feeling like, "They don't quite seem to understand what I'm saying... I'm not sure whether to be worried."
> Ever experience TFRC support? I've been a member for almost two years. I just counted how many times they failed to come through for me: zero times. And as far as I can remember, it took less than 48 hours to resolve whatever issue I was facing.
> For a Google project, this was somewhere between "space aliens" and "narnia" on the Scale of Surprising Things.
[...]
> My goal here is to finally put to rest this feeling that everyone has. There's some kind of reluctance to apply to TFRC. People always end up asking stuff like this:
> "I'm just a university student, not an established researcher. Should I apply?"
> Yes!
> "I'm just here to play around a bit with TPUs. I don't have any idea what I'm doing, but I'll poke around a bit and see what's up. Should I apply?"
> Heck yeah!
> "I have a Serious Research Project in mind. I'd like to evaluate whether the Cloud TPU VM platform is sufficient for our team's research goals. Should I apply?"
> Absolutely. But whoever you are, you've probably applied by now. Because everyone is realizing that TFRC is how you accomplish your research goals.
I expect that if you apply, you'll get your activation email within a few hours. Of course, you better get in quick. My goal here was to cause a stampede. Right now, in my experience, you'll be up and running by tomorrow. But if ten thousand people show up from HN, I don't know if that will remain true. :)
I feel a bit bad to be talking at length at TFRC. But then I remembered that none of this is off-topic in the slightest. GPT-J was proof of everything above. No TFRC, no GPT-J. The whole reason that the world can enjoy GPT-J now is because anyone can show up and start doing as many effective things as you can possibly learn.
It was all thanks to TFRC, the Cloud TPU team, the JAX team, the XLA compiler team -- hundreds of people, who have all managed to gift us this amazing opportunity. Yes, they want to win the ML mindshare war. But they know the way to win it is to care deeply about helping you achieve every one of your research goals.
Think of it like a side hobby. Best part is, it's free. (Just watch out for the egress bandwidth, ha. Otherwise you'll be talking with GCP support for your $500 refund -- and yes, that's an unpleasant experience.)
mesh-transformer-jax
-
Large Language Models: Compairing Gen2/Gen3 Models (GPT-3, GPT-J, MT5 and More)
GPT-J is a LLM case study with two goals: Training a LLM with a data source containing unique material, and using the training frameworkMesh Transformer JAX to achieve a high training efficiency through parallelization. There is no research paper about GPT-J, but on its GitHub pages, the model, different checkpoints, and the complete source code for training is given.
-
[R] Parallel Attention and Feed-Forward Net Design for Pre-training and Inference on Transformers
This idea has already been proposed in ViT-22B and GPT-J-6B.
- Show HN: Finetune LLaMA-7B on commodity GPUs using your own text
-
[D] An Instruct Version Of GPT-J Using Stanford Alpaca's Dataset
Sure. Here's the repo I used for the fine-tuning: https://github.com/kingoflolz/mesh-transformer-jax. I used 5 epochs, and appart from that I kept the default parameters in the repo.
- Boss wants me to use ChatGPT for work, but I refuse to input my personal phone number. Any advice?
-
Let's build GPT: from scratch, in code, spelled out by Andrej Karpathy
You can skip to step 4 using something like GPT-J as far as I understand: https://github.com/kingoflolz/mesh-transformer-jax#links
The pretrained model is already available.
-
Best coding model?
The Github repo suggests it's possible you can change the number of checkpoints to make it run on a GPU.
- Ask HN: What language models can I fine-tune at home?
-
selfhosted/ open-source ChatGPT alternative?
GPT-J, which uses mesh-transformer-jax: https://github.com/kingoflolz/mesh-transformer-jax
-
GPT-J, an open-source alternative to GPT-3
They hinted at it in the screenshot, but the goods are linked from the https://6b.eleuther.ai page: https://github.com/kingoflolz/mesh-transformer-jax#gpt-j-6b (Apache 2)
What are some alternatives?
gpt-neo - An implementation of model parallel GPT-2 and GPT-3-style models using the mesh-tensorflow library.
DeepSpeed - DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
helpmecode - Augmented Intelligence Programming
tensorflow - An Open Source Machine Learning Framework for Everyone
swarm-jax - Swarm training framework using Haiku + JAX + Ray for layer parallel transformer language models on unreliable, heterogeneous nodes
jax - Composable transformations of Python+NumPy programs: differentiate, vectorize, JIT to GPU/TPU, and more
KoboldAI-Client
alpaca-lora - Instruct-tune LLaMA on consumer hardware
Finetune_LLMs - Repo for fine-tuning Casual LLMs
gpt-2 - Code for the paper "Language Models are Unsupervised Multitask Learners"
cedille-ai - ✒️ Cedille is a large French language model (6B), released under an open-source license