Human-eval Alternatives

Similar projects and alternatives to human-eval

llama

184 53,371 8.1 Python human-eval VS llama

Inference code for Llama models
ggml

69 9,802 9.8 C human-eval VS ggml

Tensor library for machine learning
InfluxDB

www.influxdata.com featured

Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
chat-with-gpt

39 2,267 5.3 TypeScript human-eval VS chat-with-gpt

An open-source ChatGPT app with a voice
WizardLM

38 7,531 9.4 Python human-eval VS WizardLM

Discontinued Family of instruction-following LLMs powered by Evol-Instruct: WizardLM, WizardCoder and WizardMath
llm-humaneval-benchmarks

10 83 4.9 Jupyter Notebook human-eval VS llm-humaneval-benchmarks
evalplus

3 902 9.3 Python human-eval VS evalplus

EvalPlus for rigourous evaluation of LLM-synthesized code

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a better human-eval alternative or higher similarity.

Suggest an alternative to human-eval

human-eval reviews and mentions

Posts with mentions or reviews of human-eval. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2023-07-22.

Nathan Lambert review of LLAMA 2: Open-Source LLM from Meta
2 projects | /r/CompSocial | 22 Jul 2023

Code / math / reasoning: Not much discussion of code data in the paper and RLHF process. For instance, StarCoder at 15 billion parameters beats the best model at 40.8 for HumanEval and 49.5 MBPP (Python).
New ChatGPT rival, Claude 2, launches for open beta testing
1 project | /r/thenottheonion | 12 Jul 2023

In terms of coding capabilities, Claude 2 demonstrated a reported increase in proficiency. Its score on the Codex HumanEval, a Python programming test, rose from 56 percent to 71.2 percent. Similarly, on GSM8k, a test comprising grade-school math problems, it improved from 85.2 to 88 percent.
Hot Take: ChatGPT is not getting dumber, you are.
2 projects | /r/ChatGPT | 17 Jun 2023

there are for code https://github.com/openai/human-eval
Official WizardCoder-15B-V1.0 Released! Can Achieve 59.8% Pass@1 on HumanEval!
5 projects | /r/LocalLLaMA | 15 Jun 2023

❗Note: The above table conducts a comprehensive comparison of our WizardCoder with other models on the HumanEval and MBPP benchmarks. We adhere to the approach outlined in previous studies by generating 20 samples for each problem to estimate the pass@1 score and evaluate it with the same code. The scores of GPT4 and GPT3.5 reported by OpenAI are 67.0 and 48.1 (maybe these are the early version of GPT4&3.5).
A new way to predict when software jobs will become automated
1 project | news.ycombinator.com | 24 Jul 2022
OpenAI Codex - The Model behind GitHub Copilot
1 project | dev.to | 7 Jul 2021
A note from our sponsor - SaaSHub
www.saashub.com | 14 May 2024

SaaSHub helps you find the best software and product alternatives Learn more →

Stats

Basic human-eval repo stats

Mentions

Stars

2,014

Activity

0.0

Last Commit

3 months ago

openai/human-eval is an open source project licensed under MIT License which is an OSI approved license.

The primary programming language of human-eval is Python.

Popular Comparisons

human-eval

Human-eval Alternatives

Similar projects and alternatives to human-eval

llama

ggml

InfluxDB

chat-with-gpt

WizardLM

llm-humaneval-benchmarks

evalplus

human-eval reviews and mentions

Stats

Popular Comparisons