Our great sponsors
- CodiumAI - TestGPT | Generating meaningful tests for busy devs
- Sonar - Write Clean Python Code. Always.
- ONLYOFFICE ONLYOFFICE Docs — document collaboration in your environment
- InfluxDB - Access the most powerful time series database as a service
|2 days ago||5 days ago|
|Apache License 2.0||GNU General Public License v3.0 or later|
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
Using --deepspeed requires lots of manual tweaking
3 projects | reddit.com/r/Oobabooga | 11 May 2023
Filed a discussion item on the deepspeed project: https://github.com/microsoft/DeepSpeed/discussions/35313 projects | reddit.com/r/Oobabooga | 11 May 2023
Solution: I don't know; this is where I am stuck. https://github.com/microsoft/DeepSpeed/issues/1037 suggests that I just need to 'apt install libaio-dev', but I've done that and it doesn't help.
Whether the ML computation engineering expertise will be valuable, is the question.
2 projects | reddit.com/r/LanguageTechnology | 21 Apr 2023
There could be some spectrum of this expertise. For instance, https://github.com/NVIDIA/FasterTransformer, https://github.com/microsoft/DeepSpeed
FLiPN-FLaNK Stack Weekly for 17 April 2023
12 projects | dev.to | 17 Apr 2023
DeepSpeed Chat: Easy, Fast and Affordable RLHF Training of ChatGPT-Like Models
2 projects | news.ycombinator.com | 12 Apr 2023
DeepSpeed-Chat: Easy, Fast and Affordable RLHF Training of ChatGPT-Like Models
2 projects | news.ycombinator.com | 12 Apr 2023
12-Apr-2023 AI Summary
2 projects | reddit.com/r/u_sann540 | 11 Apr 2023
DeepSpeed Chat: Easy, Fast and Affordable RLHF Training of ChatGPT-like Models at All Scales (https://github.com/microsoft/DeepSpeed/tree/master/blogs/deepspeed-chat)
2 projects | news.ycombinator.com | 11 Apr 2023
Apple: Transformer architecture optimized for Apple Silicon
2 projects | reddit.com/r/apple | 23 Mar 2023
I'm following this closely, together with other efforts like GPTQ Quantization and Microsoft's DeepSpeed, all of which are bringing down the hardware requirements of these advanced AI models.
Facebook LLAMA is being openly distributed via torrents
15 projects | news.ycombinator.com | 3 Mar 2023
Anything that could bring this to a 10GB 3080 or 24GB 3090 without 60s/it per token?
Why Did Google Brain Exist?
2 projects | news.ycombinator.com | 26 Apr 2023
GPU cluster scaling has come a long way. Just checkout the scaling plot here: https://github.com/NVIDIA/Megatron-LM
I asked ChatGPT to rate the intelligence level of current AI systems out there.
2 projects | reddit.com/r/ChatGPT | 11 Mar 2023
Google's PaLM, Facebook's LLaMA, Nvidia's Megatron, I am missing some surely and Apple sure has something cooking as well but these are the big ones, of course none of them are publicly available, but research papers are reputable. All of the ones mentioned should beat GPT-3 although GPT-3.5 (chatGPT) should be bit better and ability to search (Bing) should level the playing field even further, but Google's PaLM with search functionality should be clearly ahead. This is why people are excited about GPT-4, GPT-3 was way ahead of anyone else when it came out but others were able to catch up since, we'll see if GPT-4 will be another bing jump among LLMs.
Nvidia Fiscal Q3 2022 Financial Result
4 projects | reddit.com/r/nvidia | 17 Nov 2021
Described a collaboration involving NVIDIA Megatron-LM and Microsoft DeepSpeed to create an efficient, scalable, 3D parallel system capable of combining data, pipeline and tensor-slicing-based parallelism.
Microsoft and NVIDIA AI Introduces MT-NLG: The Largest and Most Powerful Monolithic Transformer Language NLP Model
2 projects | reddit.com/r/LanguageTechnology | 13 Oct 2021
Microsoft and NVIDIA present the Megatron-Turing Natural Language Generation model (MT-NLG), powered by DeepSpeed and Megatron, the largest and robust monolithic transformer language model trained with 530 billion parameters.
[R] Data Movement Is All You Need: A Case Study on Optimizing Transformers
2 projects | reddit.com/r/MachineLearning | 19 Jan 2021
Nvidia's own implementation of Transformers, i.e, Megatron on NVIDIA's Selene supercomputer (where GPT-3 is possible too) -https://github.com/NVIDIA/Megatron-LM
What are some alternatives?
ColossalAI - Making large AI models cheaper, faster and more accessible
fairscale - PyTorch extensions for high performance and large scale training.
TensorRT - NVIDIA® TensorRT™, an SDK for high-performance deep learning inference, includes a deep learning inference optimizer and runtime that delivers low latency and high throughput for inference applications.
fairseq - Facebook AI Research Sequence-to-Sequence Toolkit written in Python.
mesh-transformer-jax - Model parallel transformers in JAX and Haiku
llama - Inference code for LLaMA models
gpt-neox - An implementation of model parallel autoregressive transformers on GPUs, based on the DeepSpeed library.
server - The Triton Inference Server provides an optimized cloud and edge inferencing solution.
text-generation-webui - A gradio web UI for running Large Language Models like LLaMA, llama.cpp, GPT-J, Pythia, OPT, and GALACTICA.
Finetune_LLMs - Repo for fine-tuning GPTJ and other GPT models
Pytorch - Tensors and Dynamic neural networks in Python with strong GPU acceleration