-
BIG-bench
Beyond the Imitation Game collaborative benchmark for measuring and extrapolating the capabilities of language models
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
I haven't benchmarked it yet, I always use -ins or -r so I don't see the tokens/s. I plan to check that tonight and I can share the results. I started running the version with dfyz's improvements to avx-512 performance https://github.com/ggerganov/llama.cpp/pull/933 recently. It's very slow, sure, but I like the output superiority of 65b so much that I don't want to settle for 33B even though it's 2 times faster and not that much dumber.
After that, if I'm not dead yet, I want to go after it with BIG-bench. I almost made a feature request of Oobabooga, but I don't think many other people have this dream.
BTW, if you're looking to do benchmarking against other setupts, I'd recommend using lm-eval instead. You have a lot of benchmarkts to pick from and can easily compare against what other people have run. Fabrice Bellard (yeah, that one) has benched a huge number of open LLMs, including at different sizes and quantizations: https://bellard.org/ts_server/