Official WizardCoder-15B-V1.0 Released! Can Achieve 59.8% Pass@1 on HumanEval!

InfluxDB - Power Real-Time Data Analytics at Scale

Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

www.influxdata.com

featured

SaaSHub - Software Alternatives and Reviews

SaaSHub helps you find the best software and product alternatives

www.saashub.com

featured

WizardLM

38 7,531 9.4 Python

Discontinued Family of instruction-following LLMs powered by Evol-Instruct: WizardLM, WizardCoder and WizardMath

The project repo: WizardCoder

llm-humaneval-benchmarks

10 83 4.9 Jupyter Notebook

❗Note: In this study, we copy the scores for HumanEval and HumanEval+ from the LLM-Humaneval-Benchmarks. Notably, all the mentioned models generate code solutions for each problem utilizing a single attempt, and the resulting pass rate percentage is reported. Our WizardCoder generates answers using greedy decoding and tests with the same code.

InfluxDB

www.influxdata.com featured

Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
evalplus

3 902 9.3 Python

EvalPlus for rigourous evaluation of LLM-synthesized code

❗Note: In this study, we copy the scores for HumanEval and HumanEval+ from the LLM-Humaneval-Benchmarks. Notably, all the mentioned models generate code solutions for each problem utilizing a single attempt, and the resulting pass rate percentage is reported. Our WizardCoder generates answers using greedy decoding and tests with the same code.

human-eval

6 2,014 0.0 Python

Code for the paper "Evaluating Large Language Models Trained on Code"

❗Note: The above table conducts a comprehensive comparison of our WizardCoder with other models on the HumanEval and MBPP benchmarks. We adhere to the approach outlined in previous studies by generating 20 samples for each problem to estimate the pass@1 score and evaluate it with the same code. The scores of GPT4 and GPT3.5 reported by OpenAI are 67.0 and 48.1 (maybe these are the early version of GPT4&3.5).

ggml

69 9,863 9.8 C

Tensor library for machine learning
SaaSHub

www.saashub.com featured

SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

The AI Reproducibility Crisis in GPT-3.5/GPT-4 Research

4 projects | news.ycombinator.com | 25 Aug 2023
GPT 4 new limits only 40 messages in 3 days

2 projects | /r/ChatGPT | 10 Dec 2023
ChatGPT needs its own desktop application

1 project | /r/ChatGPT | 10 Dec 2023
GPT Message limit is lying?

1 project | /r/OpenAI | 10 Dec 2023
Enhance Speed of AnkiBrain Addon

1 project | /r/ankibrain | 6 Dec 2023

Official WizardCoder-15B-V1.0 Released! Can Achieve 59.8% Pass@1 on HumanEval!

This page summarizes the projects mentioned and recommended in the original post on /r/LocalLLaMA
Benchmark chatgpt gpt-4 large-language-models program-synthesis
Post date: 15 Jun 2023

WizardLM

llm-humaneval-benchmarks

InfluxDB

evalplus

human-eval

ggml

SaaSHub

Related posts

The AI Reproducibility Crisis in GPT-3.5/GPT-4 Research

GPT 4 new limits only 40 messages in 3 days

ChatGPT needs its own desktop application

GPT Message limit is lying?

Enhance Speed of AnkiBrain Addon

Official WizardCoder-15B-V1.0 Released! Can Achieve 59.8% Pass@1 on HumanEval!

This page summarizes the projects mentioned and recommended in the original post on /r/LocalLLaMA Benchmark chatgpt gpt-4 large-language-models program-synthesis Post date: 15 Jun 2023

WizardLM

llm-humaneval-benchmarks

InfluxDB

evalplus

human-eval

ggml

SaaSHub

Related posts

The AI Reproducibility Crisis in GPT-3.5/GPT-4 Research

GPT 4 new limits only 40 messages in 3 days

ChatGPT needs its own desktop application

GPT Message limit is lying?

Enhance Speed of AnkiBrain Addon

This page summarizes the projects mentioned and recommended in the original post on /r/LocalLLaMA
Benchmark chatgpt gpt-4 large-language-models program-synthesis
Post date: 15 Jun 2023