mixture-of-experts
ModuleFormer
mixture-of-experts | ModuleFormer | |
---|---|---|
2 | 1 | |
835 | 216 | |
- | 4.6% | |
5.3 | 5.7 | |
16 days ago | 24 days ago | |
Python | Python | |
GNU General Public License v3.0 only | Apache License 2.0 |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
mixture-of-experts
- [Rumor] Potential GPT-4 architecture description
-
Local and Global loss
I have a requirement of training pipeline similar to Mixture of Experts (https://github.com/davidmrau/mixture-of-experts/blob/master/moe.py) but I want to train the Experts on a local loss for 1 epoch before predicting outputs from them (which would then be concatenated for the global loss of MoE). Can anyone suggest what’s the best way to set up this training pipeline?
ModuleFormer
What are some alternatives?
pytorch-tutorial - PyTorch Tutorial for Deep Learning Researchers
LMOps - General technology for enabling AI capabilities w/ LLMs and MLLMs
transformers - 🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
StableLM - StableLM: Stability AI Language Models
mmdetection - OpenMMLab Detection Toolbox and Benchmark
lingvo - Lingvo
hivemind - Decentralized deep learning in PyTorch. Built to train models on thousands of volunteers across the world.
lm-scorer - 📃Language Model based sentences scoring library
yolov5 - YOLOv5 🚀 in PyTorch > ONNX > CoreML > TFLite
tutel - Tutel MoE: An Optimized Mixture-of-Experts Implementation
Real-Time-Voice-Cloning - Clone a voice in 5 seconds to generate arbitrary speech in real-time