Swarm training framework using Haiku + JAX + Ray for layer parallel transformer language models on unreliable, heterogeneous nodes
Why do you think that https://github.com/kingoflolz/mesh-transformer-jax is a good alternative to swarm-jax
Swarm training framework using Haiku + JAX + Ray for layer parallel transformer language models on unreliable, heterogeneous nodes
Why do you think that https://github.com/kingoflolz/mesh-transformer-jax is a good alternative to swarm-jax