Our great sponsors
-
WorkOS
The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
In transformers, they tried really hard to have a single function or method to deal with both self and cross attention mechanisms, masking, positional and relative encodings, interpolation etc. While it allows a user to use the same function/method for any model, it has led to severe parameter bloat. Just compare the original implementation of llama by FAIR with the implementation by HF to get an idea.
In transformers, they tried really hard to have a single function or method to deal with both self and cross attention mechanisms, masking, positional and relative encodings, interpolation etc. While it allows a user to use the same function/method for any model, it has led to severe parameter bloat. Just compare the original implementation of llama by FAIR with the implementation by HF to get an idea.
what do you think about tinygrad? I think its a good example of growing and well written, (partially) well documented library with many close to reference implementations