GPT-4 API general availability

Our great sponsors

WorkOS - The modern identity platform for B2B SaaS

InfluxDB - Power Real-Time Data Analytics at Scale

SaaSHub - Software Alternatives and Reviews

Our great sponsors

llama.cpp

769 55,846 10.0 C++

LLM inference in C/C++

at 158ms per token, if we guess a word is 2.5 tokens, then that's 151 words per minute, much faster than most people can type. On a $250 laptop. Isn't the future neat?
the code I was running: https://github.com/ggerganov/llama.cpp
and the model: https://huggingface.co/TheBloke/WizardLM-7B-uncensored-GGML
There are other models that may perform better, I'm going to be doing a lot of screwing around with OpenLLaMA this weekend.

WizardLM

38 7,531 9.4 Python

Discontinued Family of instruction-following LLMs powered by Evol-Instruct: WizardLM, WizardCoder and WizardMath

In terms of speed, we're talking about 140t/s for 7B models, and 40t/s for 33B models on a 3090/4090 now.[1] (1 token ~= 0.75 word) It's quite zippy. llama.cpp performs close on Nvidia GPUs now (but they don't have a handy chart) and you can get decent performance on 13B models on M1/M2 Macs.
You can take a look at a list of evals here: https://llm-tracker.info/books/evals/page/list-of-evals - for general usage, I think home-rolled evals like llm-jeopardy [2] and local-llm-comparison [3] by hobbyists are more useful than most of the benchmark rankings.
That being said, personally I mostly use GPT-4 for code assistance to that's what I'm most interested in, and the latest code assistants are scoring quite well: https://github.com/abacaj/code-eval - a recent replit-3b fine tune the human-eval results for open models (as a point of reference, GPT-3.5 gets 60.4 on pass@1 and 68.9 on pass@10 [4]) - I've only just started playing around with it since replit model tooling is not as good as llamas (doc here: https://llm-tracker.info/books/howto-guides/page/replit-mode...).
I'm interested in potentially applying reflexion or some of the other techniques that have been tried to even further increase coding abilities. (InterCode in particular has caught my eye https://intercode-benchmark.github.io/)
[1] https://github.com/turboderp/exllama#results-so-far
[2] https://github.com/aigoopy/llm-jeopardy
[3] https://github.com/Troyanovsky/Local-LLM-comparison/tree/mai...
[4] https://github.com/nlpxucan/WizardLM/tree/main/WizardCoder

WorkOS

workos.com sponsored

The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
openai-cookbook

214 55,805 9.5 MDX

Examples and guides for using the OpenAI API

(I'm an engineer at OpenAI)
Very sorry to hear about these issues, particularly the timeouts. Latency is top of mind for us and something we are continuing to push on. Does streaming work for your use case?
https://github.com/openai/openai-cookbook/blob/main/examples...
We definitely want to investigate these and the billing issues further. Would you consider emailing me your org ID and any request IDs (if you have them) at [email protected]?
Thank you for using the API, and really appreciate the honest feedback.

azure-search-openai-demo

11 5,286 9.5 Python

A sample app for the Retrieval-Augmented Generation pattern running in Azure, using Azure AI Search for retrieval and Azure OpenAI large language models to power ChatGPT-style and Q&A experiences.

You can see region availability here for Azure OpenAI:
https://learn.microsoft.com/en-us/azure/cognitive-services/o...
It's definitely limited, but there's currently more than one region available.
(I happen to be working at the moment on a location-related fix to our most popular Azure OpenAI sample, https://github.com/Azure-Samples/azure-search-openai-demo )

open_llama

52 7,193 5.3

OpenLLaMA, a permissively licensed open source reproduction of Meta AI’s LLaMA 7B trained on the RedPajama dataset

OpenLLaMA is though. https://github.com/openlm-research/open_llama
All of these are surmountable problems.
We can beat OpenAI.
We can drain their moat.

guidance

89 12,248 9.5 Jupyter Notebook

Discontinued A guidance language for controlling large language models. [Moved to: https://github.com/guidance-ai/guidance] (by microsoft)
code-eval

5 345 8.0 Python

Run evaluation on LLMs using human-eval benchmark

In terms of speed, we're talking about 140t/s for 7B models, and 40t/s for 33B models on a 3090/4090 now.[1] (1 token ~= 0.75 word) It's quite zippy. llama.cpp performs close on Nvidia GPUs now (but they don't have a handy chart) and you can get decent performance on 13B models on M1/M2 Macs.
You can take a look at a list of evals here: https://llm-tracker.info/books/evals/page/list-of-evals - for general usage, I think home-rolled evals like llm-jeopardy [2] and local-llm-comparison [3] by hobbyists are more useful than most of the benchmark rankings.
That being said, personally I mostly use GPT-4 for code assistance to that's what I'm most interested in, and the latest code assistants are scoring quite well: https://github.com/abacaj/code-eval - a recent replit-3b fine tune the human-eval results for open models (as a point of reference, GPT-3.5 gets 60.4 on pass@1 and 68.9 on pass@10 [4]) - I've only just started playing around with it since replit model tooling is not as good as llamas (doc here: https://llm-tracker.info/books/howto-guides/page/replit-mode...).
I'm interested in potentially applying reflexion or some of the other techniques that have been tried to even further increase coding abilities. (InterCode in particular has caught my eye https://intercode-benchmark.github.io/)
[1] https://github.com/turboderp/exllama#results-so-far
[2] https://github.com/aigoopy/llm-jeopardy
[3] https://github.com/Troyanovsky/Local-LLM-comparison/tree/mai...
[4] https://github.com/nlpxucan/WizardLM/tree/main/WizardCoder

InfluxDB

www.influxdata.com sponsored

Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
exllama

64 2,582 9.0 Python

A more memory-efficient rewrite of the HF transformers implementation of Llama for use with quantized weights.

In terms of speed, we're talking about 140t/s for 7B models, and 40t/s for 33B models on a 3090/4090 now.[1] (1 token ~= 0.75 word) It's quite zippy. llama.cpp performs close on Nvidia GPUs now (but they don't have a handy chart) and you can get decent performance on 13B models on M1/M2 Macs.
You can take a look at a list of evals here: https://llm-tracker.info/books/evals/page/list-of-evals - for general usage, I think home-rolled evals like llm-jeopardy [2] and local-llm-comparison [3] by hobbyists are more useful than most of the benchmark rankings.
That being said, personally I mostly use GPT-4 for code assistance to that's what I'm most interested in, and the latest code assistants are scoring quite well: https://github.com/abacaj/code-eval - a recent replit-3b fine tune the human-eval results for open models (as a point of reference, GPT-3.5 gets 60.4 on pass@1 and 68.9 on pass@10 [4]) - I've only just started playing around with it since replit model tooling is not as good as llamas (doc here: https://llm-tracker.info/books/howto-guides/page/replit-mode...).
I'm interested in potentially applying reflexion or some of the other techniques that have been tried to even further increase coding abilities. (InterCode in particular has caught my eye https://intercode-benchmark.github.io/)
[1] https://github.com/turboderp/exllama#results-so-far
[2] https://github.com/aigoopy/llm-jeopardy
[3] https://github.com/Troyanovsky/Local-LLM-comparison/tree/mai...
[4] https://github.com/nlpxucan/WizardLM/tree/main/WizardCoder

llm-jeopardy

12 107 7.8 JavaScript

Automated prompting and scoring framework to evaluate LLMs using updated human knowledge prompts

In terms of speed, we're talking about 140t/s for 7B models, and 40t/s for 33B models on a 3090/4090 now.[1] (1 token ~= 0.75 word) It's quite zippy. llama.cpp performs close on Nvidia GPUs now (but they don't have a handy chart) and you can get decent performance on 13B models on M1/M2 Macs.
You can take a look at a list of evals here: https://llm-tracker.info/books/evals/page/list-of-evals - for general usage, I think home-rolled evals like llm-jeopardy [2] and local-llm-comparison [3] by hobbyists are more useful than most of the benchmark rankings.
That being said, personally I mostly use GPT-4 for code assistance to that's what I'm most interested in, and the latest code assistants are scoring quite well: https://github.com/abacaj/code-eval - a recent replit-3b fine tune the human-eval results for open models (as a point of reference, GPT-3.5 gets 60.4 on pass@1 and 68.9 on pass@10 [4]) - I've only just started playing around with it since replit model tooling is not as good as llamas (doc here: https://llm-tracker.info/books/howto-guides/page/replit-mode...).
I'm interested in potentially applying reflexion or some of the other techniques that have been tried to even further increase coding abilities. (InterCode in particular has caught my eye https://intercode-benchmark.github.io/)
[1] https://github.com/turboderp/exllama#results-so-far
[2] https://github.com/aigoopy/llm-jeopardy
[3] https://github.com/Troyanovsky/Local-LLM-comparison/tree/mai...
[4] https://github.com/nlpxucan/WizardLM/tree/main/WizardCoder

Local-LLM-Comparison-Colab-UI

20 868 9.1 Jupyter Notebook

Compare the performance of different LLM that can be deployed locally on consumer hardware. Run yourself with Colab WebUI.

In terms of speed, we're talking about 140t/s for 7B models, and 40t/s for 33B models on a 3090/4090 now.[1] (1 token ~= 0.75 word) It's quite zippy. llama.cpp performs close on Nvidia GPUs now (but they don't have a handy chart) and you can get decent performance on 13B models on M1/M2 Macs.
You can take a look at a list of evals here: https://llm-tracker.info/books/evals/page/list-of-evals - for general usage, I think home-rolled evals like llm-jeopardy [2] and local-llm-comparison [3] by hobbyists are more useful than most of the benchmark rankings.
That being said, personally I mostly use GPT-4 for code assistance to that's what I'm most interested in, and the latest code assistants are scoring quite well: https://github.com/abacaj/code-eval - a recent replit-3b fine tune the human-eval results for open models (as a point of reference, GPT-3.5 gets 60.4 on pass@1 and 68.9 on pass@10 [4]) - I've only just started playing around with it since replit model tooling is not as good as llamas (doc here: https://llm-tracker.info/books/howto-guides/page/replit-mode...).
I'm interested in potentially applying reflexion or some of the other techniques that have been tried to even further increase coding abilities. (InterCode in particular has caught my eye https://intercode-benchmark.github.io/)
[1] https://github.com/turboderp/exllama#results-so-far
[2] https://github.com/aigoopy/llm-jeopardy
[3] https://github.com/Troyanovsky/Local-LLM-comparison/tree/mai...
[4] https://github.com/nlpxucan/WizardLM/tree/main/WizardCoder

private-gpt

131 51,732 9.2 Python

Interact with your documents using the power of GPT, 100% privately, no data leaks

https://gpt4all.io/index.html is a good place to start, you can literally download one of the many recommended models.
https://github.com/imartinez/privateGPT is great if you want do it with code.

gpt4all

139 64,046 9.8 C++

gpt4all: run open-source LLMs anywhere

I've found https://gpt4all.io/ to be the fastest way to get started. I've also started moving my notes to https://llm-tracker.info/ which should help make it easier for people getting started: https://llm-tracker.info/books/howto-guides/page/getting-sta...

open-llms

22 10,116 7.7

📋 A list of open LLMs available for commercial use.

This is the most well-maintained list of commercially usable open LLMs: https://github.com/eugeneyan/open-llms
MPT, OpenLLaMA, and Falcon are probably the most generally useful.
For code, Replit Code (specifically replit-code-instruct-glaive) and StarCoder (WizardCoder-15B) are the current top open models and both can be used commercially.

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Help with my Frontend-Code for AZURE GPT - Will Tip
1 project | /r/learnpython | 13 Sep 2023
Pricing question
1 project | /r/AZURE | 5 Jul 2023
New to Azure, deployed a MS project from github. How can I edit the .py files in azure?
1 project | /r/AZURE | 3 Jul 2023
How to understand somebody else's code? Any tools that can help visualize would be a life saver!
1 project | /r/learnpython | 30 May 2023
How to understand somebody else's code? Any tools that can help visualise would be a life saver!
1 project | /r/learnprogramming | 30 May 2023

GPT-4 API general availability

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com
openai chatgpt Azure humaneval azurecognitivesearch
Post date: 6 Jul 2023

llama.cpp

WizardLM

WorkOS

openai-cookbook

azure-search-openai-demo

open_llama

guidance

code-eval

InfluxDB

exllama

llm-jeopardy

Local-LLM-Comparison-Colab-UI

private-gpt

gpt4all

open-llms

Related posts

GPT-4 API general availability

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com openai chatgpt Azure humaneval azurecognitivesearch Post date: 6 Jul 2023

Related posts

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com
openai chatgpt Azure humaneval azurecognitivesearch
Post date: 6 Jul 2023