Our great sponsors
-
gptq
Code for the ICLR 2023 paper "GPTQ: Accurate Post-training Quantization of Generative Pretrained Transformers".
-
WorkOS
The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
I thought this was a good overview of the idea Triton can circumvent the CUDA moat: https://www.semianalysis.com/p/nvidiaopenaitritonpytorch
It also looks like they added MLIR backend to Triton though I wonder if Mojo has advantages since it was built on MLIR? https://github.com/openai/triton/pull/1004
When you click on the strip link to preorder the tinybox, it is advertised as a box running LLaMA 65B FP16 for $15000.
I can run LLaMA 65B GPTQ4b on my $2300 PC (used parts, Dual RTX 3090), and according to the GPTQ paper(§) the quality of the model will not suffer much at all by the quantization.
(§) https://arxiv.org/abs/2210.17323