Top 11 mixture-of-expert Open-Source Projects

DeepSpeed

51 32,834 9.8 Python

DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.

Project mention: Can we discuss MLOps, Deployment, Optimizations, and Speed? | /r/LocalLLaMA | 2023-12-06

DeepSpeed can handle parallelism concerns, and even offload data/model to RAM, or even NVMe (!?) . I'm surprised I don't see this project used more.

LLaMA-Factory

2 20,971 9.9 Python

Unify Efficient Fine-Tuning of 100+ LLMs

Project mention: Show HN: GPU Prices on eBay | news.ycombinator.com | 2024-02-23

Depends what model you want to train, and how well you want your computer to keep working while you're doing it.
If you're interested in large language models there's a table of vram requirements for fine-tuning at [1] which says you could do the most basic type of fine-tuning on a 7B parameter model with 8GB VRAM.
You'll find that training takes quite a long time, and as a lot of the GPU power is going on training, your computer's responsiveness will suffer - even basic things like scrolling in your web browser or changing tabs uses the GPU, after all.
Spend a bit more and you'll probably have a better time.
[1] https://github.com/hiyouga/LLaMA-Factory?tab=readme-ov-file#...

InfluxDB

www.influxdata.com featured

Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
mixtral-offloading

3 2,235 8.7 Python

Run Mixtral-8x7B models in Colab or consumer desktops

Project mention: DBRX: A New Open LLM | news.ycombinator.com | 2024-03-27

Waiting for Mixed Quantization with MQQ and MoE Offloading [1]. With that I was able to run Mistral 8x7B on my 10 GB VRAM rtx3080... This should work for DBRX and should shave off a ton of VRAM requirement.
1. https://github.com/dvmazur/mixtral-offloading?tab=readme-ov-...

hivemind

40 1,840 5.4 Python

Decentralized deep learning in PyTorch. Built to train models on thousands of volunteers across the world.

Project mention: You can now train a 70B language model at home | news.ycombinator.com | 2024-03-07

https://github.com/learning-at-home/hivemind is also relevant

mixture-of-experts

2 835 5.3 Python

PyTorch Re-Implementation of "The Sparsely-Gated Mixture-of-Experts Layer" by Noam Shazeer et al. https://arxiv.org/abs/1701.06538

Project mention: [Rumor] Potential GPT-4 architecture description | /r/LocalLLaMA | 2023-06-20

tutel

1 658 6.5 Python

Tutel MoE: An Optimized Mixture-of-Experts Implementation
smt

1 624 8.6 Jupyter Notebook

Surrogate Modeling Toolbox

Project mention: botorch VS SMT - a user suggested alternative | libhunt.com/r/botorch | 2023-12-06

For unconstrained Bayesian Optimization

SaaSHub

www.saashub.com featured

SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
mixture-of-experts

1 525 4.1 Python

A Pytorch implementation of Sparsely-Gated Mixture of Experts, for massively increasing the parameter count of language models (by lucidrains)
mergoo

1 248 4.7 Python

A library for easily merging multiple LLM experts, and efficiently train the merged LLM.

Project mention: A Library to build MoE from HF models | news.ycombinator.com | 2024-04-08

https://github.com/Leeroo-AI/mergoo

st-moe-pytorch

1 223 7.8 Python

Implementation of ST-Moe, the latest incarnation of MoE after years of research at Brain, in Pytorch

Project mention: will the point meet in 2024? | /r/LocalLLaMA | 2023-12-05

awesome-adaptive-computation

1 98 8.8

A curated reading list of research in Adaptive Computation, Dynamic Compute & Mixture of Experts (MoE).

Project mention: The Frontier of Adaptive Computation in Machine Learning | news.ycombinator.com | 2023-08-22

NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020).

mixture-of-experts related posts

Would anyone be interested in contributing to some group projects?

4 projects | /r/learnmachinelearning | 24 Aug 2023
[Rumor] Potential GPT-4 architecture description

2 projects | /r/LocalLLaMA | 20 Jun 2023
Hive mind:Train deep learning models on thousands of volunteers across the world

1 project | news.ycombinator.com | 20 Jun 2023
Could a model not be trained by a decentralized network? Like Seti @ home or kinda-sorta like bitcoin. Petals accomplishes this somewhat, but if raw computer power is the only barrier to open-source I'd be happy to try organizing decentalized computing efforts

2 projects | /r/LocalLLaMA | 17 Jun 2023
Orca (built on llama13b) looks like the new sheriff in town

2 projects | /r/LocalLLaMA | 6 Jun 2023
Do you think that AI research will slow down to a halt because of regulation?

1 project | /r/singularity | 21 May 2023
[D] Google "We Have No Moat, And Neither Does OpenAI": Leaked Internal Google Document Claims Open Source AI Will Outcompete Google and OpenAI

1 project | /r/MachineLearning | 4 May 2023
A note from our sponsor - InfluxDB
www.influxdata.com | 5 May 2024

Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality. Learn more →

Index

What are some of the best open-source mixture-of-expert projects? This list will help you:

	Project	Stars
1	DeepSpeed	32,834
2	LLaMA-Factory	20,971
3	mixtral-offloading	2,235
4	hivemind	1,840
5	mixture-of-experts	835
6	tutel	658
7	smt	624
8	mixture-of-experts	525
9	mergoo	248
10	st-moe-pytorch	223
11	awesome-adaptive-computation	98