SaaSHub helps you find the best software and product alternatives Learn more →
Top 9 Python mixture-of-expert Projects
-
DeepSpeed
DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
-
WorkOS
The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
-
hivemind
Decentralized deep learning in PyTorch. Built to train models on thousands of volunteers across the world.
-
mixture-of-experts
PyTorch Re-Implementation of "The Sparsely-Gated Mixture-of-Experts Layer" by Noam Shazeer et al. https://arxiv.org/abs/1701.06538
-
mixture-of-experts
A Pytorch implementation of Sparsely-Gated Mixture of Experts, for massively increasing the parameter count of language models (by lucidrains)
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
-
st-moe-pytorch
Implementation of ST-Moe, the latest incarnation of MoE after years of research at Brain, in Pytorch
Project mention: Can we discuss MLOps, Deployment, Optimizations, and Speed? | /r/LocalLLaMA | 2023-12-06DeepSpeed can handle parallelism concerns, and even offload data/model to RAM, or even NVMe (!?) . I'm surprised I don't see this project used more.
Depends what model you want to train, and how well you want your computer to keep working while you're doing it.
If you're interested in large language models there's a table of vram requirements for fine-tuning at [1] which says you could do the most basic type of fine-tuning on a 7B parameter model with 8GB VRAM.
You'll find that training takes quite a long time, and as a lot of the GPU power is going on training, your computer's responsiveness will suffer - even basic things like scrolling in your web browser or changing tabs uses the GPU, after all.
Spend a bit more and you'll probably have a better time.
[1] https://github.com/hiyouga/LLaMA-Factory?tab=readme-ov-file#...
Waiting for Mixed Quantization with MQQ and MoE Offloading [1]. With that I was able to run Mistral 8x7B on my 10 GB VRAM rtx3080... This should work for DBRX and should shave off a ton of VRAM requirement.
1. https://github.com/dvmazur/mixtral-offloading?tab=readme-ov-...
https://github.com/learning-at-home/hivemind is also relevant
https://github.com/Leeroo-AI/mergoo
Python mixture-of-experts related posts
- Would anyone be interested in contributing to some group projects?
- [Rumor] Potential GPT-4 architecture description
- Hive mind:Train deep learning models on thousands of volunteers across the world
- Could a model not be trained by a decentralized network? Like Seti @ home or kinda-sorta like bitcoin. Petals accomplishes this somewhat, but if raw computer power is the only barrier to open-source I'd be happy to try organizing decentalized computing efforts
- Orca (built on llama13b) looks like the new sheriff in town
- Do you think that AI research will slow down to a halt because of regulation?
- [D] Google "We Have No Moat, And Neither Does OpenAI": Leaked Internal Google Document Claims Open Source AI Will Outcompete Google and OpenAI
-
A note from our sponsor - SaaSHub
www.saashub.com | 26 Apr 2024
Index
What are some of the best open-source mixture-of-expert projects in Python? This list will help you:
Project | Stars | |
---|---|---|
1 | DeepSpeed | 32,550 |
2 | LLaMA-Factory | 17,050 |
3 | mixtral-offloading | 2,230 |
4 | hivemind | 1,833 |
5 | mixture-of-experts | 818 |
6 | tutel | 653 |
7 | mixture-of-experts | 517 |
8 | st-moe-pytorch | 218 |
9 | mergoo | 199 |
Sponsored