Our great sponsors
-
WorkOS
The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
There's no such things as "base models have only the temperature setting". Models do not have settings (temperature, repetition penalty, etc), the sampling code does, which obviously you can use on any model.
For example, here's a function from llama.cpp that applies repetition penalty: https://github.com/ggerganov/llama.cpp/blob/master/llama.cpp...
Here's the one from transformers:
https://github.com/huggingface/transformers/blob/0a55d9f7376...
To summarize how they work: you keep some number of previously generated tokens, and once you get logits that you want to sample a new token from, you find the logits for existing tokens and multiply them by a penalty, thus lowering the probability of the corresponding tokens.
I'm pretty fatigued on constantly providing references and sources in this thread but an example of what they've made availably publicly:
https://github.com/snap-research/EfficientFormer