Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality. Learn more →
Top 17 Python llm-inference Projects
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
-
GenerativeAIExamples
Generative AI reference workflows optimized for accelerated infrastructure and microservice architecture.
-
local-llm-function-calling
A tool for generating function arguments and choosing what function to call with local LLMs
-
nos
⚡️ A fast and flexible PyTorch inference server that runs locally, on any cloud or AI HW. (by autonomi-ai)
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
-
llm4regression
Examining how large language models (LLMs) perform across various synthetic regression tasks when given (input, output) examples in their context, without any parameter update
-
edsl
Design, conduct and analyze results of AI-powered surveys and experiments. Simulate social science and market research with large numbers of AI agents and LLMs. (by expectedparrot)
-
llm-vscode-inference-server
An endpoint server for efficiently serving quantized open-source LLMs for code.
-
Exa
Unleash the full potential of exascale LLMs on consumer-class GPUs, proven by extensive benchmarks, with no long-term adjustments and minimal learning curve. (by kyegomez)
-
aibench-llm-endpoints
Runner in charge of collecting metrics from LLM inference endpoints for the Unify Hub
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
One thing I wanted to add and call attention to is the importance of licensing in open models. This is often overlooked when we blindly accept the vague branding of models as “open”, but I am noticing that many open weight models are actually using encumbered proprietary licenses rather than standard open source licenses that are OSI approved (https://opensource.org/licenses). As an example, Databricks’s DBRX model has a proprietary license that forces adherence to their highly restrictive Acceptable Use Policy by referencing a live website hosting their AUP (https://github.com/databricks/dbrx/blob/main/LICENSE), which means as they change their AUP, you may be further restricted in the future. Meta’s Llama is similar (https://github.com/meta-llama/llama/blob/main/LICENSE ). I’m not sure who can depend on these models given this flaw.
Project mention: Show HN: LLMFlows – LangChain alternative for explicit and transparent apps | news.ycombinator.com | 2023-07-29
Project mention: Tell HN: OpenAI still has a moat, it's called function calling and its API | news.ycombinator.com | 2024-02-21hello? https://github.com/rizerphe/local-llm-function-calling
Project mention: Show HN: NOS – A fast, and ergonomic PyTorch inference server | news.ycombinator.com | 2023-12-14
Project mention: LLM Is a Capable Regressor When Given In-Context Examples | news.ycombinator.com | 2024-04-13If you're interested in exploring other metrics, the output of the models and code examples for how to read them are available at: https://github.com/robertvacareanu/llm4regression/tree/main/...
For example:
Project mention: Python package for administering surveys to LLMs | news.ycombinator.com | 2024-04-18
Project mention: Replit's new AI Model now available on Hugging Face | news.ycombinator.com | 2023-10-11Requests for code generation are made via an HTTP request.
You can use the Hugging Face Inference API or your own HTTP endpoint, provided it adheres to the API specified here[1] or here[2]."
It's fairly easy to use your own model locally with the plugin. You can just use the one of the community developed inference servers, which are listed at the bottom of the page, but here's the links[3] to both[4].
[1]: https://huggingface.co/docs/api-inference/detailed_parameter...
[2]: https://huggingface.github.io/text-generation-inference/#/Te...
[3]: https://github.com/wangcx18/llm-vscode-inference-server
[4]: https://github.com/wangcx18/llm-vscode-inference-server
The author appears motivated by some... interesting... beliefs. Hard to tell if this entire thing is a joke or not.
https://github.com/kyegomez/EXA#for-humanity
https://blog.apac.ai/liberation-awaits
Project mention: Mixture-of-Depths: Dynamically allocating compute in transformers | news.ycombinator.com | 2024-04-08There are already some implementations out there which attempt to accomplish this!
Here's an example: https://github.com/silphendio/sliced_llama
A gist pertaining to said example: https://gist.github.com/silphendio/535cd9c1821aa1290aa10d587...
Here's a discussion about integrating this capability with ExLlama: https://github.com/turboderp/exllamav2/pull/275
And same as above but for llama.cpp: https://github.com/ggerganov/llama.cpp/issues/4718#issuecomm...
Project mention: Show HN: Unify – Dynamic LLM Benchmarks and SSO for Multi-Vendor Deployment | news.ycombinator.com | 2024-02-06Hey HN! I’m the founder of Unify, and we’ve just released our Model Hub, which provides a collection of LLM endpoints with live runtime benchmarks all plotted across time: https://unify.ai/hub
A key finding is that static tabular runtime benchmarks for LLMs simply do not work. It’s necessary to take a time-series perspective, and plot the variations through time.
We currently have 21 models provided by: Anyscale, Perplexity AI, Replicate, Together AI, OctoAI, Mistral AI and OpenAI, with more on the roadmap.
We test across different regions (Asia, US, Europe), with varied concurrency and sequence length. By plotting across time, our dashboard highlights the stability and variability of the different endpoints, and their ongoing evolution across API updates and system changes. Our benchmarking code is fully open source: https://github.com/unifyai/aibench-llm-endpoints
Our unified API also makes it very easy to test and deploy these different endpoints in production, without needing to create several accounts.
Our Hub is a work in progress, and we will be releasing new features every week.
What are your thoughts? Both positive and negative comments are very welcome. We’ll try to quickly incorporate all feedback!
I recorded a quick(ish) demo video a few hours ago, explaining how to get started, for those who are interested in learning more: https://youtu.be/0a6-C2_Bmh0
There is also a longer version here: https://youtu.be/o8yD_QBhmsw
Finally, as a thanks to HN readers, the promo code “HACKERNEWS” can be used to claim $5 per week in free credits, compatible with our ever expanding list of LLM providers. You can sign up here [https://console.unify.ai/], and claim the free credits here [https://unify.ai/docs/hub/home/pricing.html#top-up-code] if interested.
Thanks all!
Project mention: Show HN: Tabby back end in 20 Python lines (self-hosted AI coding assistant) | news.ycombinator.com | 2024-01-29
I've been exploring several solutions, such as the one that I explain in my post about an AI-powered, python-based Telegram bot and the RAG assistant I created, everything-rag. Nevertheless, I needed something more direct and efficient, and Coze was a perfect match for me!
Python llm-inference related posts
-
LLM Is a Capable Regressor When Given In-Context Examples
-
Hello OLMo: A Open LLM
-
Show HN: Prompts as (WASM) Programs
-
Mistral 8x7B 32k model [magnet]
-
Meta: Code Llama, an AI Tool for Coding
-
A note from our sponsor - InfluxDB
www.influxdata.com | 10 May 2024
Index
What are some of the best open-source llm-inference projects in Python? This list will help you:
Project | Stars | |
---|---|---|
1 | dbrx | 2,407 |
2 | lmdeploy | 2,391 |
3 | GenerativeAIExamples | 1,557 |
4 | llmflows | 618 |
5 | local-llm-function-calling | 273 |
6 | nos | 116 |
7 | syncode | 93 |
8 | llm4regression | 93 |
9 | vllm-rocm | 76 |
10 | cappr | 63 |
11 | edsl | 56 |
12 | llm-vscode-inference-server | 44 |
13 | Exa | 20 |
14 | sliced_llama | 15 |
15 | aibench-llm-endpoints | 12 |
16 | tabby-backend-py | 11 |
17 | everything-rag | 3 |
Sponsored