LLM Colosseum: Make LLMs fight in SFIII

Our great sponsors

WorkOS - The modern identity platform for B2B SaaS

InfluxDB - Power Real-Time Data Analytics at Scale

SaaSHub - Software Alternatives and Reviews

Our great sponsors

llm-colosseum

4 914 9.4 Jupyter Notebook

Benchmark LLMs by fighting in Street Fighter 3! The new way to evaluate the quality of an LLM

Hello guys,
Tired of current boring LLMs benchmark ? I'm sharing with you a fun project built during the Mistral AI SF hackathon.
Using a RL framework, we made LLMs fight against each other in real time in Street Fighter III. You can find the repo here : https://github.com/OpenGenerativeAI/llm-colosseum.
Aside from the fact that it's very funny to see Mistral and others performing Hadouken, we found that it is a great way to benchmark language models. They need to quickly understand their environment and take actions accordingly.
With >400 fights, check out the ELO ranking on the HF space here : https://huggingface.co/spaces/junior-labs/llm-colosseum

WorkOS

workos.com sponsored

The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

First impressions: GPU + GCP Batch
2 projects | dev.to | 26 Apr 2024
Searchformer: Beyond a* Better Planning with Transformers via Search Dynamics
1 project | news.ycombinator.com | 26 Apr 2024
Voxel51 Filtered Views Newsletter – April 26, 2024
1 project | dev.to | 26 Apr 2024
DataFrameAndNotebooksAmsterdam2024 – Discovering why trains come in late in NL
1 project | news.ycombinator.com | 25 Apr 2024
Why Vector Compression Matters
3 projects | dev.to | 24 Apr 2024

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com Post date: 27 Mar 2024

llm-colosseum

WorkOS

Related posts