Evalplus Alternatives

Similar projects and alternatives to evalplus

ggml

69 9,642 9.8 C evalplus VS ggml

Tensor library for machine learning
WizardLM

38 7,531 9.4 Python evalplus VS WizardLM

Discontinued Family of instruction-following LLMs powered by Evol-Instruct: WizardLM, WizardCoder and WizardMath
InfluxDB

www.influxdata.com sponsored

Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
llm-humaneval-benchmarks

10 83 4.9 Jupyter Notebook evalplus VS llm-humaneval-benchmarks
zero-shot-replication

4 69 8.9 Python evalplus VS zero-shot-replication
human-eval

6 1,981 0.0 Python evalplus VS human-eval

Code for the paper "Evaluating Large Language Models Trained on Code"
gpt_academic

2 55,872 9.8 Python evalplus VS gpt_academic

为GPT/GLM等LLM大语言模型提供实用化交互接口，特别优化论文阅读/润色/写作体验，模块化设计，支持自定义快捷按钮&函数插件，支持Python和C++等项目剖析&自译解功能，PDF/LaTex论文翻译&总结功能，支持并行问询多种LLM模型，支持chatglm3等本地模型。接入通义千问, deepseekcoder, 讯飞星火, 文心一言, llama2, rwkv, claude2, moss等。
llm_oracle

2 13 6.4 Python evalplus VS llm_oracle

LLM Oracle is a GPT-4 powered tool for predicting future events. It's like a Magic 8 Ball that is able to perform basic research, calculations, and reasoning.
WorkOS

workos.com sponsored

The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
Baichuan-13B

2 2,956 7.3 Python evalplus VS Baichuan-13B

A 13B large language model developed by Baichuan Intelligent Technology
chatgpt_academic

1 31,116 10.0 Python evalplus VS chatgpt_academic

Discontinued 为GPT/GLM提供图形交互界面，特别优化论文阅读润色体验，模块化设计支持自定义快捷按钮&函数插件，支持代码块表格显示，Tex公式双显示，新增Python和C++项目剖析&自译解功能，PDF/LaTex论文翻译&总结功能，支持并行问询多种LLM模型，支持清华chatglm等本地模型 [Moved to: https://github.com/binary-husky/gpt_academic]

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a better evalplus alternative or higher similarity.

Suggest an alternative to evalplus

evalplus reviews and mentions

Posts with mentions or reviews of evalplus. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2023-08-25.

The AI Reproducibility Crisis in GPT-3.5/GPT-4 Research
4 projects | news.ycombinator.com | 25 Aug 2023

*Further Reading*:
- [GPT-4's decline over time (HackerNews)](https://news.ycombinator.com/item?id=36786407)
- [GPT-4 downgrade discussions (OpenAI Forums)](https://community.openai.com/t/gpt-4-has-been-severely-downg...)
- [Behavioral changes in ChatGPT (arXiv)](https://arxiv.org/abs/2307.09009)
- [Zero-Shot Replication Effort (Github)](https://github.com/emrgnt-cmplxty/zero-shot-replication)
- [Inconsistencies in GPT-4 HumanEval (Github)](https://github.com/evalplus/evalplus/issues/15)
- [Early experiments with GPT-4 (arXiv)](https://arxiv.org/abs/2303.12712)
- [GPT-4 Technical Report (arXiv)](https://arxiv.org/abs/2303.08774)
Official WizardCoder-15B-V1.0 Released! Can Achieve 59.8% Pass@1 on HumanEval!
5 projects | /r/LocalLLaMA | 15 Jun 2023

❗Note: In this study, we copy the scores for HumanEval and HumanEval+ from the LLM-Humaneval-Benchmarks. Notably, all the mentioned models generate code solutions for each problem utilizing a single attempt, and the resulting pass rate percentage is reported. Our WizardCoder generates answers using greedy decoding and tests with the same code.