Maxtext: A simple, performant and scalable Jax LLM

InfluxDB - Power Real-Time Data Analytics at Scale

Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

www.influxdata.com

featured

SaaSHub - Software Alternatives and Reviews

SaaSHub helps you find the best software and product alternatives

www.saashub.com

featured

maxtext

4 1,266 9.7 Python

A simple, performant and scalable Jax LLM!

This might be a tangent, but why does JAX only support the saving / serialization of AOT compilation executables for TPU [1]? It would be great to have the ability to save compiled functions and not have to JIT compile something every time you restart a session.
(Julia used to have this problem too, but they've made great progress on caching JIT compiled functions to reduce latency.)
[1]: https://github.com/google/maxtext?tab=readme-ov-file#ahead-o...

EasyLM

8 2,241 7.7 Python

Large language models (LLMs) made easy, EasyLM is a one stop solution for pre-training, finetuning, evaluating and serving LLMs in JAX/Flax.
InfluxDB

www.influxdata.com featured

Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
levanter

1 445 9.4 Python

Legible, Scalable, Reproducible Foundation Models with Named Tensors and Jax
t5x

7 2,503 8.5 Python

[3]: https://github.com/google-research/t5x
Asking because I have worked extensively on training a large model on a TPU cluster, and started with Levanter, then tried MaxText, and finally ended up on EasyLM. My thoughts are:
- Levanter is well intentioned but is unproven and lacking in features. For instance, their sharding is odd in that it requires embedding dimension to be a multiple of the number of devices, so I can't test using a model with embedding dimension 768 on a 512-device pod. Lost confidence in Levanter after finding some glaring correctness bugs (and helping get them fixed). Also, while I'm a huge fan of Equinox's approach, it's sadly underdeveloped (for instance, there's no way to specify non-default weight initialization strategies without manually doing model surgery to set weights).
- MaxText was just very difficult to work with. We felt like we were fighting against it every time we needed to change something because we would be digging through numerous needless layers of abstraction. My favorite was after one long day of debugging, I found a function who's only purpose was to pass its arguments to another function untouched; this function's only purpose was to pass its arguments untouched to a new, third function, that then slightly changed them and passed them to a fourth function that did the work.
- EasyLM is, as the name says, easy. But on a deeper dive, the sharding functionality seems to be underdeveloped. What they call "FSDP" is not necessarily true FSDP, it's literally just a certain axis that the JAX mesh is being sharded around that happens to shard some data axes and some model weight axes.
I'm still searching for a "perfect" JAX LLM codebase - any pointers?

flax

10 5,538 9.7 Python

Flax is a neural network library for JAX that is designed for flexibility.

Is t5x an encoder/decoder architecture?
Some more general options.
The Flax ecosystem
https://github.com/google/flax?tab=readme-ov-file
or dm-haiku
https://github.com/google-deepmind/dm-haiku
were some of the best developed communities in the Jax AI field
Perhaps the “trax” repo? https://github.com/google/trax
Some HF examples https://github.com/huggingface/transformers/tree/main/exampl...
Sadly it seems much of the work is proprietary these days, but one example could be Grok-1, if you customize the details. https://github.com/xai-org/grok-1/blob/main/run.py

dm-haiku

10 2,806 7.8 Python

JAX-based neural network library

Is t5x an encoder/decoder architecture?
Some more general options.
The Flax ecosystem
https://github.com/google/flax?tab=readme-ov-file
or dm-haiku
https://github.com/google-deepmind/dm-haiku
were some of the best developed communities in the Jax AI field
Perhaps the “trax” repo? https://github.com/google/trax
Some HF examples https://github.com/huggingface/transformers/tree/main/exampl...
Sadly it seems much of the work is proprietary these days, but one example could be Grok-1, if you customize the details. https://github.com/xai-org/grok-1/blob/main/run.py

trax

7 7,957 4.7 Python

Trax — Deep Learning with Clear Code and Speed

Is t5x an encoder/decoder architecture?
Some more general options.
The Flax ecosystem
https://github.com/google/flax?tab=readme-ov-file
or dm-haiku
https://github.com/google-deepmind/dm-haiku
were some of the best developed communities in the Jax AI field
Perhaps the “trax” repo? https://github.com/google/trax
Some HF examples https://github.com/huggingface/transformers/tree/main/exampl...
Sadly it seems much of the work is proprietary these days, but one example could be Grok-1, if you customize the details. https://github.com/xai-org/grok-1/blob/main/run.py

SaaSHub

www.saashub.com featured

SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
transformers

176 125,369 10.0 Python

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.

Is t5x an encoder/decoder architecture?
Some more general options.
The Flax ecosystem
https://github.com/google/flax?tab=readme-ov-file
or dm-haiku
https://github.com/google-deepmind/dm-haiku
were some of the best developed communities in the Jax AI field
Perhaps the “trax” repo? https://github.com/google/trax
Some HF examples https://github.com/huggingface/transformers/tree/main/exampl...
Sadly it seems much of the work is proprietary these days, but one example could be Grok-1, if you customize the details. https://github.com/xai-org/grok-1/blob/main/run.py

grok-1

8 48,188 5.9 Python

Grok open release

Is t5x an encoder/decoder architecture?
Some more general options.
The Flax ecosystem
https://github.com/google/flax?tab=readme-ov-file
or dm-haiku
https://github.com/google-deepmind/dm-haiku
were some of the best developed communities in the Jax AI field
Perhaps the “trax” repo? https://github.com/google/trax
Some HF examples https://github.com/huggingface/transformers/tree/main/exampl...
Sadly it seems much of the work is proprietary these days, but one example could be Grok-1, if you customize the details. https://github.com/xai-org/grok-1/blob/main/run.py

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

DataDreamer

1 project | news.ycombinator.com | 11 Feb 2024
A Curated List of Free ML/ DL YouTube Courses

1 project | news.ycombinator.com | 28 Jan 2024
ML-YouTube-Courses: NEW Courses - star count:11622.0

1 project | /r/algoprojects | 7 Dec 2023
ML-YouTube-Courses: NEW Courses - star count:11622.0

1 project | /r/algoprojects | 6 Dec 2023
ML-YouTube-Courses: NEW Courses - star count:11622.0

1 project | /r/algoprojects | 5 Dec 2023

Maxtext: A simple, performant and scalable Jax LLM

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com
Jax Deep Learning Machine Learning Transformer Natural Language Processing
Post date: 23 Apr 2024

maxtext

EasyLM

InfluxDB

levanter

t5x

flax

dm-haiku

trax

SaaSHub

transformers

grok-1

Related posts

DataDreamer

A Curated List of Free ML/ DL YouTube Courses

ML-YouTube-Courses: NEW Courses - star count:11622.0

ML-YouTube-Courses: NEW Courses - star count:11622.0

ML-YouTube-Courses: NEW Courses - star count:11622.0

Maxtext: A simple, performant and scalable Jax LLM

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com Jax Deep Learning Machine Learning Transformer Natural Language Processing Post date: 23 Apr 2024

Related posts

DataDreamer

A Curated List of Free ML/ DL YouTube Courses

ML-YouTube-Courses: NEW Courses - star count:11622.0

ML-YouTube-Courses: NEW Courses - star count:11622.0

ML-YouTube-Courses: NEW Courses - star count:11622.0

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com
Jax Deep Learning Machine Learning Transformer Natural Language Processing
Post date: 23 Apr 2024