[P] Yandex open sources 100b large language model weights (YaLM)

Our great sponsors

InfluxDB - Power Real-Time Data Analytics at Scale

WorkOS - The modern identity platform for B2B SaaS

SaaSHub - Software Alternatives and Reviews

Our great sponsors

YaLM-100B

35 3,721 0.0 Python

Pretrained language model with 100B parameters

Github: https://github.com/yandex/YaLM-100B

NeMo

29 10,021 9.8 Python

A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)

It seems that teams who have the ability to train large models are converging on a Megatron/DeepSpeed combination for the near future. Seems the HF BigScience project has its own version, Yandex builds off the DeepSpeed/Megatron example code and NeMo has started to build model extensions on top of Megatron (for example). This is where my understanding breaks down, but I do question how much incremental progress can be made if reliant on Megatron (especially in other modalities or with custom attention mechanisms). Maybe it doesn't matter at these scales.

InfluxDB

www.influxdata.com sponsored

Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project