Suggest an alternative to

lm-evaluation-harness

A framework for few-shot evaluation of language models.

Why do you think that https://github.com/stanford-crfm/helm is a good alternative to lm-evaluation-harness