Megatron-LM
DeepLearningExamples
Our great sponsors
- InfluxDB - Collect and Analyze Billions of Data Points in Real Time
- Onboard AI - Learn any GitHub repo in 59 seconds
- SaaSHub - Software Alternatives and Reviews
Megatron-LM | DeepLearningExamples | |
---|---|---|
15 | 7 | |
7,029 | 11,773 | |
5.6% | 1.4% | |
0.0 | 0.0 | |
6 days ago | 27 days ago | |
Python | Jupyter Notebook | |
GNU General Public License v3.0 or later | - |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
Megatron-LM
- Why async gradient update doesn't get popular in LLM community?
-
Why Did Google Brain Exist?
GPU cluster scaling has come a long way. Just checkout the scaling plot here: https://github.com/NVIDIA/Megatron-LM
-
I asked ChatGPT to rate the intelligence level of current AI systems out there.
Google's PaLM, Facebook's LLaMA, Nvidia's Megatron, I am missing some surely and Apple sure has something cooking as well but these are the big ones, of course none of them are publicly available, but research papers are reputable. All of the ones mentioned should beat GPT-3 although GPT-3.5 (chatGPT) should be bit better and ability to search (Bing) should level the playing field even further, but Google's PaLM with search functionality should be clearly ahead. This is why people are excited about GPT-4, GPT-3 was way ahead of anyone else when it came out but others were able to catch up since, we'll see if GPT-4 will be another bing jump among LLMs.
-
Nvidia Fiscal Q3 2022 Financial Result
Described a collaboration involving NVIDIA Megatron-LM and Microsoft DeepSpeed to create an efficient, scalable, 3D parallel system capable of combining data, pipeline and tensor-slicing-based parallelism.
-
Microsoft and NVIDIA AI Introduces MT-NLG: The Largest and Most Powerful Monolithic Transformer Language NLP Model
Microsoft and NVIDIA present the Megatron-Turing Natural Language Generation model (MT-NLG), powered by DeepSpeed and Megatron, the largest and robust monolithic transformer language model trained with 530 billion parameters.
-
[R] Data Movement Is All You Need: A Case Study on Optimizing Transformers
Nvidia's own implementation of Transformers, i.e, Megatron on NVIDIA's Selene supercomputer (where GPT-3 is possible too) -https://github.com/NVIDIA/Megatron-LM
DeepLearningExamples
-
[R] Data Movement Is All You Need: A Case Study on Optimizing Transformers
The Nvidia's implementation of BERT has a long way to go (I don't know about the implementations of input independent gradient computations in their backprop). But, there are scaled benchmarks on DGX A100's -https://github.com/NVIDIA/DeepLearningExamples/tree/master/TensorFlow/LanguageModeling/BERT
What are some alternatives?
DeepSpeed - DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
ColossalAI - Making large AI models cheaper, faster and more accessible
TensorRT - NVIDIA® TensorRT™, an SDK for high-performance deep learning inference, includes a deep learning inference optimizer and runtime that delivers low latency and high throughput for inference applications.
server - The Triton Inference Server provides an optimized cloud and edge inferencing solution.
lidar-harmonization - Code release for Intensity Harmonization for Airborne LiDAR
llm-search - Querying local documents, powered by LLM
pix2seq - Pix2Seq codebase: multi-tasks with generative modeling (autoregressive and diffusion)
alpaca_eval - An automatic evaluator for instruction-following language models. Human-validated, high-quality, cheap, and fast.
ontogpt - LLM-based ontological extraction tools, including SPIRES
ChatGPT-Siri - Shortcuts for Siri using ChatGPT API gpt-3.5-turbo & gpt-4 model, supports continuous conversations, configure the API key & save chat records. 由 ChatGPT API gpt-3.5-turbo & gpt-4 模型驱动的智能 Siri,支持连续对话,配置API key,配置系统prompt,保存聊天记录。
AutoCog - Automaton & Cognition
notebooks - Notebooks illustrating the use of Norse, a library for deep-learning with spiking neural networks.