Our great sponsors
-
It appears the indexing for the model parts is deliberately not contiguous; the 03-82 range represents the main 80 transformer layers. https://github.com/yandex/YaLM-100B/blob/main/megatron_lm/me...
-
That's pretty much what SLIDE [0] does. The driver was achieving performance parity with GPUs for CPU training, but presumably the same could apply to running inference on models too large to load into consumer GPU memory.
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
-
doesnt seem the code is there - pretrained models are there. https://github.com/kingoflolz/mesh-transformer-jax/#gpt-j-6b
https://huggingface.co/EleutherAI/gpt-j-6B
isnt that so ?
-
gpt-neox
An implementation of model parallel autoregressive transformers on GPUs, based on the DeepSpeed library.
-
I downloaded the weights and made a .torrent file (also a magnet link, see raw README.md). Can somebody else who downloaded the files as well doublecheck the checksums?
-
WorkOS
The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.