Python llm-inference

Open-source Python projects categorized as llm-inference

Top 17 Python llm-inference Projects

  • dbrx

    Code examples and resources for DBRX, a large language model developed by Databricks

  • Project mention: Hello OLMo: A Open LLM | news.ycombinator.com | 2024-04-08

    One thing I wanted to add and call attention to is the importance of licensing in open models. This is often overlooked when we blindly accept the vague branding of models as “open”, but I am noticing that many open weight models are actually using encumbered proprietary licenses rather than standard open source licenses that are OSI approved (https://opensource.org/licenses). As an example, Databricks’s DBRX model has a proprietary license that forces adherence to their highly restrictive Acceptable Use Policy by referencing a live website hosting their AUP (https://github.com/databricks/dbrx/blob/main/LICENSE), which means as they change their AUP, you may be further restricted in the future. Meta’s Llama is similar (https://github.com/meta-llama/llama/blob/main/LICENSE ). I’m not sure who can depend on these models given this flaw.

  • lmdeploy

    LMDeploy is a toolkit for compressing, deploying, and serving LLMs.

  • Project mention: FLaNK-AIM Weekly 06 May 2024 | dev.to | 2024-05-06
  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • GenerativeAIExamples

    Generative AI reference workflows optimized for accelerated infrastructure and microservice architecture.

  • Project mention: FLaNK Weekly 18 Dec 2023 | dev.to | 2023-12-18
  • llmflows

    LLMFlows - Simple, Explicit and Transparent LLM Apps

  • Project mention: Show HN: LLMFlows – LangChain alternative for explicit and transparent apps | news.ycombinator.com | 2023-07-29
  • local-llm-function-calling

    A tool for generating function arguments and choosing what function to call with local LLMs

  • Project mention: Tell HN: OpenAI still has a moat, it's called function calling and its API | news.ycombinator.com | 2024-02-21

    hello? https://github.com/rizerphe/local-llm-function-calling

  • nos

    ⚡️ A fast and flexible PyTorch inference server that runs locally, on any cloud or AI HW. (by autonomi-ai)

  • Project mention: Show HN: NOS – A fast, and ergonomic PyTorch inference server | news.ycombinator.com | 2023-12-14
  • syncode

    Efficient and general syntactical decoding for Large Language Models

  • Project mention: LLMs following grammar for general-purpose PLs | news.ycombinator.com | 2024-05-09
  • SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

    SaaSHub logo
  • llm4regression

    Examining how large language models (LLMs) perform across various synthetic regression tasks when given (input, output) examples in their context, without any parameter update

  • Project mention: LLM Is a Capable Regressor When Given In-Context Examples | news.ycombinator.com | 2024-04-13

    If you're interested in exploring other metrics, the output of the models and code examples for how to read them are available at: https://github.com/robertvacareanu/llm4regression/tree/main/...

    For example:

  • vllm-rocm

    vLLM: A high-throughput and memory-efficient inference and serving engine for LLMs

  • Project mention: Experimental Mixtral MoE on vLLM! | /r/LocalLLaMA | 2023-12-10
  • cappr

    Completion After Prompt Probability. Make your LLM make a choice

  • edsl

    Design, conduct and analyze results of AI-powered surveys and experiments. Simulate social science and market research with large numbers of AI agents and LLMs. (by expectedparrot)

  • Project mention: Python package for administering surveys to LLMs | news.ycombinator.com | 2024-04-18
  • llm-vscode-inference-server

    An endpoint server for efficiently serving quantized open-source LLMs for code.

  • Project mention: Replit's new AI Model now available on Hugging Face | news.ycombinator.com | 2023-10-11

    Requests for code generation are made via an HTTP request.

    You can use the Hugging Face Inference API or your own HTTP endpoint, provided it adheres to the API specified here[1] or here[2]."

    It's fairly easy to use your own model locally with the plugin. You can just use the one of the community developed inference servers, which are listed at the bottom of the page, but here's the links[3] to both[4].

    [1]: https://huggingface.co/docs/api-inference/detailed_parameter...

    [2]: https://huggingface.github.io/text-generation-inference/#/Te...

    [3]: https://github.com/wangcx18/llm-vscode-inference-server

    [4]: https://github.com/wangcx18/llm-vscode-inference-server

  • Exa

    Unleash the full potential of exascale LLMs on consumer-class GPUs, proven by extensive benchmarks, with no long-term adjustments and minimal learning curve. (by kyegomez)

  • Project mention: Tree of Thoughts | news.ycombinator.com | 2023-05-26

    The author appears motivated by some... interesting... beliefs. Hard to tell if this entire thing is a joke or not.

    https://github.com/kyegomez/EXA#for-humanity

    https://blog.apac.ai/liberation-awaits

  • sliced_llama

    Simple LLM inference server

  • Project mention: Mixture-of-Depths: Dynamically allocating compute in transformers | news.ycombinator.com | 2024-04-08

    There are already some implementations out there which attempt to accomplish this!

    Here's an example: https://github.com/silphendio/sliced_llama

    A gist pertaining to said example: https://gist.github.com/silphendio/535cd9c1821aa1290aa10d587...

    Here's a discussion about integrating this capability with ExLlama: https://github.com/turboderp/exllamav2/pull/275

    And same as above but for llama.cpp: https://github.com/ggerganov/llama.cpp/issues/4718#issuecomm...

  • aibench-llm-endpoints

    Runner in charge of collecting metrics from LLM inference endpoints for the Unify Hub

  • Project mention: Show HN: Unify – Dynamic LLM Benchmarks and SSO for Multi-Vendor Deployment | news.ycombinator.com | 2024-02-06

    Hey HN! I’m the founder of Unify, and we’ve just released our Model Hub, which provides a collection of LLM endpoints with live runtime benchmarks all plotted across time: https://unify.ai/hub

    A key finding is that static tabular runtime benchmarks for LLMs simply do not work. It’s necessary to take a time-series perspective, and plot the variations through time.

    We currently have 21 models provided by: Anyscale, Perplexity AI, Replicate, Together AI, OctoAI, Mistral AI and OpenAI, with more on the roadmap.

    We test across different regions (Asia, US, Europe), with varied concurrency and sequence length. By plotting across time, our dashboard highlights the stability and variability of the different endpoints, and their ongoing evolution across API updates and system changes. Our benchmarking code is fully open source: https://github.com/unifyai/aibench-llm-endpoints

    Our unified API also makes it very easy to test and deploy these different endpoints in production, without needing to create several accounts.

    Our Hub is a work in progress, and we will be releasing new features every week.

    What are your thoughts? Both positive and negative comments are very welcome. We’ll try to quickly incorporate all feedback!

    I recorded a quick(ish) demo video a few hours ago, explaining how to get started, for those who are interested in learning more: https://youtu.be/0a6-C2_Bmh0

    There is also a longer version here: https://youtu.be/o8yD_QBhmsw

    Finally, as a thanks to HN readers, the promo code “HACKERNEWS” can be used to claim $5 per week in free credits, compatible with our ever expanding list of LLM providers. You can sign up here [https://console.unify.ai/], and claim the free credits here [https://unify.ai/docs/hub/home/pricing.html#top-up-code] if interested.

    Thanks all!

  • tabby-backend-py

    Tabby (self-hosted AI coding assistant) server in 20 lines of python

  • Project mention: Show HN: Tabby back end in 20 Python lines (self-hosted AI coding assistant) | news.ycombinator.com | 2024-01-29
  • everything-rag

    Introducing everything-rag, your fully customizable and local chatbot assistant! 🤖

  • Project mention: GeneticsBot - Learn Genetics with open source knowledge | dev.to | 2024-04-26

    I've been exploring several solutions, such as the one that I explain in my post about an AI-powered, python-based Telegram bot and the RAG assistant I created, everything-rag. Nevertheless, I needed something more direct and efficient, and Coze was a perfect match for me!

  • SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

    SaaSHub logo
NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020).

Python llm-inference related posts

Index

What are some of the best open-source llm-inference projects in Python? This list will help you:

Project Stars
1 dbrx 2,407
2 lmdeploy 2,391
3 GenerativeAIExamples 1,557
4 llmflows 618
5 local-llm-function-calling 273
6 nos 116
7 syncode 93
8 llm4regression 93
9 vllm-rocm 76
10 cappr 63
11 edsl 56
12 llm-vscode-inference-server 44
13 Exa 20
14 sliced_llama 15
15 aibench-llm-endpoints 12
16 tabby-backend-py 11
17 everything-rag 3

Sponsored
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com