Top 17 Python llm-inference Projects

dbrx

4 2,407 5.9 Python

Code examples and resources for DBRX, a large language model developed by Databricks

Project mention: Hello OLMo: A Open LLM | news.ycombinator.com | 2024-04-08

One thing I wanted to add and call attention to is the importance of licensing in open models. This is often overlooked when we blindly accept the vague branding of models as “open”, but I am noticing that many open weight models are actually using encumbered proprietary licenses rather than standard open source licenses that are OSI approved (https://opensource.org/licenses). As an example, Databricks’s DBRX model has a proprietary license that forces adherence to their highly restrictive Acceptable Use Policy by referencing a live website hosting their AUP (https://github.com/databricks/dbrx/blob/main/LICENSE), which means as they change their AUP, you may be further restricted in the future. Meta’s Llama is similar (https://github.com/meta-llama/llama/blob/main/LICENSE ). I’m not sure who can depend on these models given this flaw.

lmdeploy

4 2,391 9.8 Python

LMDeploy is a toolkit for compressing, deploying, and serving LLMs.

Project mention: FLaNK-AIM Weekly 06 May 2024 | dev.to | 2024-05-06

InfluxDB

www.influxdata.com featured

Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
GenerativeAIExamples

1 1,557 7.5 Python

Generative AI reference workflows optimized for accelerated infrastructure and microservice architecture.

Project mention: FLaNK Weekly 18 Dec 2023 | dev.to | 2023-12-18

llmflows

1 618 8.6 Python

LLMFlows - Simple, Explicit and Transparent LLM Apps

Project mention: Show HN: LLMFlows – LangChain alternative for explicit and transparent apps | news.ycombinator.com | 2023-07-29

local-llm-function-calling

1 273 7.0 Python

A tool for generating function arguments and choosing what function to call with local LLMs

Project mention: Tell HN: OpenAI still has a moat, it's called function calling and its API | news.ycombinator.com | 2024-02-21

hello? https://github.com/rizerphe/local-llm-function-calling

nos

1 116 9.2 Python

⚡️ A fast and flexible PyTorch inference server that runs locally, on any cloud or AI HW. (by autonomi-ai)

Project mention: Show HN: NOS – A fast, and ergonomic PyTorch inference server | news.ycombinator.com | 2023-12-14

syncode

1 93 9.6 Python

Efficient and general syntactical decoding for Large Language Models

Project mention: LLMs following grammar for general-purpose PLs | news.ycombinator.com | 2024-05-09

SaaSHub

www.saashub.com featured

SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
llm4regression

3 93 8.6 Python

Examining how large language models (LLMs) perform across various synthetic regression tasks when given (input, output) examples in their context, without any parameter update

Project mention: LLM Is a Capable Regressor When Given In-Context Examples | news.ycombinator.com | 2024-04-13

If you're interested in exploring other metrics, the output of the models and code examples for how to read them are available at: https://github.com/robertvacareanu/llm4regression/tree/main/...
For example:

vllm-rocm

1 76 9.9 Python

vLLM: A high-throughput and memory-efficient inference and serving engine for LLMs

Project mention: Experimental Mixtral MoE on vLLM! | /r/LocalLLaMA | 2023-12-10

cappr

4 63 9.4 Python

Completion After Prompt Probability. Make your LLM make a choice
edsl

1 56 9.9 Python

Design, conduct and analyze results of AI-powered surveys and experiments. Simulate social science and market research with large numbers of AI agents and LLMs. (by expectedparrot)

Project mention: Python package for administering surveys to LLMs | news.ycombinator.com | 2024-04-18

llm-vscode-inference-server

1 44 5.3 Python

An endpoint server for efficiently serving quantized open-source LLMs for code.

Project mention: Replit's new AI Model now available on Hugging Face | news.ycombinator.com | 2023-10-11

Requests for code generation are made via an HTTP request.
You can use the Hugging Face Inference API or your own HTTP endpoint, provided it adheres to the API specified here[1] or here[2]."
It's fairly easy to use your own model locally with the plugin. You can just use the one of the community developed inference servers, which are listed at the bottom of the page, but here's the links[3] to both[4].
[1]: https://huggingface.co/docs/api-inference/detailed_parameter...
[2]: https://huggingface.github.io/text-generation-inference/#/Te...
[3]: https://github.com/wangcx18/llm-vscode-inference-server
[4]: https://github.com/wangcx18/llm-vscode-inference-server

Exa

4 20 8.8 Python

Unleash the full potential of exascale LLMs on consumer-class GPUs, proven by extensive benchmarks, with no long-term adjustments and minimal learning curve. (by kyegomez)

Project mention: Tree of Thoughts | news.ycombinator.com | 2023-05-26

The author appears motivated by some... interesting... beliefs. Hard to tell if this entire thing is a joke or not.
https://github.com/kyegomez/EXA#for-humanity
https://blog.apac.ai/liberation-awaits

sliced_llama

1 15 7.6 Python

Simple LLM inference server

Project mention: Mixture-of-Depths: Dynamically allocating compute in transformers | news.ycombinator.com | 2024-04-08

There are already some implementations out there which attempt to accomplish this!
Here's an example: https://github.com/silphendio/sliced_llama
A gist pertaining to said example: https://gist.github.com/silphendio/535cd9c1821aa1290aa10d587...
Here's a discussion about integrating this capability with ExLlama: https://github.com/turboderp/exllamav2/pull/275
And same as above but for llama.cpp: https://github.com/ggerganov/llama.cpp/issues/4718#issuecomm...

aibench-llm-endpoints

1 12 4.5 Python

Runner in charge of collecting metrics from LLM inference endpoints for the Unify Hub

Project mention: Show HN: Unify – Dynamic LLM Benchmarks and SSO for Multi-Vendor Deployment | news.ycombinator.com | 2024-02-06

Hey HN! I’m the founder of Unify, and we’ve just released our Model Hub, which provides a collection of LLM endpoints with live runtime benchmarks all plotted across time: https://unify.ai/hub
A key finding is that static tabular runtime benchmarks for LLMs simply do not work. It’s necessary to take a time-series perspective, and plot the variations through time.
We currently have 21 models provided by: Anyscale, Perplexity AI, Replicate, Together AI, OctoAI, Mistral AI and OpenAI, with more on the roadmap.
We test across different regions (Asia, US, Europe), with varied concurrency and sequence length. By plotting across time, our dashboard highlights the stability and variability of the different endpoints, and their ongoing evolution across API updates and system changes. Our benchmarking code is fully open source: https://github.com/unifyai/aibench-llm-endpoints
Our unified API also makes it very easy to test and deploy these different endpoints in production, without needing to create several accounts.
Our Hub is a work in progress, and we will be releasing new features every week.
What are your thoughts? Both positive and negative comments are very welcome. We’ll try to quickly incorporate all feedback!
I recorded a quick(ish) demo video a few hours ago, explaining how to get started, for those who are interested in learning more: https://youtu.be/0a6-C2_Bmh0
There is also a longer version here: https://youtu.be/o8yD_QBhmsw
Finally, as a thanks to HN readers, the promo code “HACKERNEWS” can be used to claim $5 per week in free credits, compatible with our ever expanding list of LLM providers. You can sign up here [https://console.unify.ai/], and claim the free credits here [https://unify.ai/docs/hub/home/pricing.html#top-up-code] if interested.
Thanks all!

tabby-backend-py

1 11 3.9 Python

Tabby (self-hosted AI coding assistant) server in 20 lines of python

Project mention: Show HN: Tabby back end in 20 Python lines (self-hosted AI coding assistant) | news.ycombinator.com | 2024-01-29

everything-rag

1 3 6.7 Python

Introducing everything-rag, your fully customizable and local chatbot assistant! 🤖

Project mention: GeneticsBot - Learn Genetics with open source knowledge | dev.to | 2024-04-26

I've been exploring several solutions, such as the one that I explain in my post about an AI-powered, python-based Telegram bot and the RAG assistant I created, everything-rag. Nevertheless, I needed something more direct and efficient, and Coze was a perfect match for me!

SaaSHub

www.saashub.com featured

SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020).

Python llm-inference related posts

LLM Is a Capable Regressor When Given In-Context Examples

3 projects | news.ycombinator.com | 13 Apr 2024
Hello OLMo: A Open LLM

3 projects | news.ycombinator.com | 8 Apr 2024
Show HN: Prompts as (WASM) Programs

9 projects | news.ycombinator.com | 11 Mar 2024
Mistral 8x7B 32k model [magnet]

6 projects | news.ycombinator.com | 8 Dec 2023
Meta: Code Llama, an AI Tool for Coding

18 projects | news.ycombinator.com | 24 Aug 2023
A note from our sponsor - InfluxDB
www.influxdata.com | 10 May 2024

Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality. Learn more →

Index

What are some of the best open-source llm-inference projects in Python? This list will help you:

	Project	Stars
1	dbrx	2,407
2	lmdeploy	2,391
3	GenerativeAIExamples	1,557
4	llmflows	618
5	local-llm-function-calling	273
6	nos	116
7	syncode	93
8	llm4regression	93
9	vllm-rocm	76
10	cappr	63
11	edsl	56
12	llm-vscode-inference-server	44
13	Exa	20
14	sliced_llama	15
15	aibench-llm-endpoints	12
16	tabby-backend-py	11
17	everything-rag	3