-
LLMZoo
⚡LLM Zoo is a project that provides data, models, and evaluation benchmark for large language models.⚡
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
Does anyone know of any updated benchmarks for LLMs? I only know of one and it's not updated - https://docs.google.com/spreadsheets/d/1kT4or6b0Fedd-W_jMwYpb63e1ZR3aePczz3zlbJW-Y4/edit#gid=741531996. I think this spreadsheet was made possibly from using this tool https://github.com/EleutherAI/lm-evaluation-harness and language tasks dataset available there. It would be nice if there are benchmarks for recently released LLMs but the spreadsheet is only for viewing and does not allow community edits. Would such benchmarks be helpful for you? What is your favorite open source LLM so far and for which task?
Missing Vicuna, Dolly, BELLE, phoenix, MOSS, the ones used by open assistant.
Missing Vicuna, Dolly, BELLE, phoenix, MOSS, the ones used by open assistant.