-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
-
WizardLM
Discontinued Family of instruction-following LLMs powered by Evol-Instruct: WizardLM, WizardCoder and WizardMath
Additionally, I found one additional unnecessary whitespace in both my Alpaca and Vicuna prompts and got rid of it to match the recommended prompts better. Third, I tested a much broader set of prompt configurations. For each model in the chart above, I only included the best prompt configuration (which is marked after the model). You can find the corresponding prompts here: https://github.com/my-other-github-account/llm-humaneval-benchmarks/blob/main/templates.py
Starcoder/Codegen: As you all expected, the coding models do quite well at code! Of the OSS models these perform the best. I still fall a few percent short of the advertised HumanEval+ results that some of these provide in their papers using my prompt, settings, and parser - but it is important to note that I am simply counting the pass rate of single attempts for each of these models. So this is not directly comparable to the pass@1 metric as defined in the Codex paper (for reasons they discuss in said paper) - my N is 1, their N is 200, so if you see anyone provide pass@1 in their peer reviewed papers those results will be more reliable than mine - and mine are expected to have higher variance. Also, in the case of Starcoder am using an IFT variation of their model - so it is slightly different than the version in their paper - as it is more dialogue tuned. I expected Starcoderplus to outperform Starcoder, but it looks like it is actually expected to perform worse at Python - as it is a generalist model - and better at everything else instead. There is a great benchmark here in development that is working on multiple languages (and unlike HumanEval is also not developed by OpenAI - which is a huge plus in my book) - so this will be interesting to keep an eye on especially for models like Starcoderplus: https://github.com/the-crypt-keeper/can-ai-code
I just saw this WizardCoder: https://github.com/nlpxucan/WizardLM/blob/main/WizardCoder/README.md
Related posts
-
OmniGlue: Generalizable Feature Matching with Foundation Model Guidance
-
Knowledge Base Support for the Generic Bedrock Agent Test UI
-
Ask HN: How does modern FreeCAD compare with Solidworks?
-
Show HN: Empower-functions, SOTA OSS function calling LLM
-
We created the first open-source implementation of Meta's TestGenāLLM