Top 23 Python llm Projects

LLaMA-Factory

2 17,050 9.9 Python

Unify Efficient Fine-Tuning of 100+ LLMs

Project mention: Show HN: GPU Prices on eBay | news.ycombinator.com | 2024-02-23

Depends what model you want to train, and how well you want your computer to keep working while you're doing it.
If you're interested in large language models there's a table of vram requirements for fine-tuning at [1] which says you could do the most basic type of fine-tuning on a 7B parameter model with 8GB VRAM.
You'll find that training takes quite a long time, and as a lot of the GPU power is going on training, your computer's responsiveness will suffer - even basic things like scrolling in your web browser or changing tabs uses the GPU, after all.
Spend a bit more and you'll probably have a better time.
[1] https://github.com/hiyouga/LLaMA-Factory?tab=readme-ov-file#...

chroma

32 12,189 9.7 Python

the AI-native open-source embedding database

Project mention: Let’s build AI-tools with the help of AI and Typescript! | dev.to | 2024-04-23

Package installer for Python (pip), we use this for installing the Python-based packages, such as Jupyter Lab, and we're going to use this for installing other Python-based tools like the Chroma DB vector database

InfluxDB

www.influxdata.com sponsored

Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
LLMSurvey

3 8,716 7.9 Python

The official GitHub page for the survey paper "A Survey of Large Language Models".

Project mention: Ask HN: Textbook Regarding LLMs | news.ycombinator.com | 2024-03-23

Here’s another one - it’s older but has some interesting charts and graphs.
https://arxiv.org/abs/2303.18223

AutoGPTQ

19 3,744 9.5 Python

An easy-to-use LLMs quantization package with user-friendly apis, based on GPTQ algorithm.

Project mention: Setting up LLAMA2 70B Chat locally | /r/developersIndia | 2023-08-18

gpt-code-ui

17 3,482 7.6 Python

An open source implementation of OpenAI's ChatGPT Code interpreter

Project mention: Should I tell my employer about a product that makes my job irrelevant? | /r/cscareerquestions | 2023-07-11

Is this of any help?

api-for-open-llm

1 1,952 9.5 Python

Openai style api for open large language models, using LLMs just as chatgpt! Support for LLaMA, LLaMA-2, BLOOM, Falcon, Baichuan, Qwen, Xverse, SqlCoder, CodeLLaMA, ChatGLM, ChatGLM2, ChatGLM3 etc. 开源大模型的统一后端接口

Project mention: FLaNK Stack Weekly for 14 Aug 2023 | dev.to | 2023-08-14

DemoGPT

31 1,566 9.2 Python

Create 🦜️🔗 LangChain apps by just using prompts🌟 Star to support our work! | 只需使用句子即可创建 LangChain 应用程序。给个star支持我们的工作吧！

Project mention: Llama 2 Code Interpreter | news.ycombinator.com | 2023-07-23

WorkOS

workos.com sponsored

The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
swirl-search

32 1,509 9.9 Python

Swirl is an open-source search platform that uses AI to search multiple content and data sources simultaneously and return AI-ranked results. And provides summaries of your answers from searches using LLMs. It's a one-click, easy-to-use Retrieval Augmented Generation (RAG) Solution.

Project mention: GitHub - swirlai/swirl-search: Swirl is an open-source search platform that uses AI to search multiple content and data sources simultaneously, finds the best results using a reader LLM, then prompts Generative AI, enabling you to get answers based on your data. | /r/programming | 2023-12-05

langroid

15 1,509 9.9 Python

Harness LLMs with Multi-Agent Programming

Project mention: OpenAI: Streaming is now available in the Assistants API | news.ycombinator.com | 2024-03-14

This was indeed true in the beginning, and I don’t know if this has changed. Inserting messages with Assistant role is crucial for many reasons, such as if you want to implement caching, or otherwise edit/compress a previous assistant response for cost or other reason.
At the time I implemented a work-around in Langroid[1]: since you can only insert a “user” role message, prepend the content with ASSISTANT: whenever you want it to be treated as an assistant role. This actually works as expected and I was able to do caching. I explained it in this forum:
https://community.openai.com/t/add-custom-roles-to-messages-...
[1] the Langroid code that adds a message with a given role, using this above “assistant spoofing trick”:
https://github.com/langroid/langroid/blob/main/langroid/agen...

loopgpt

20 1,390 8.5 Python

Modular Auto-GPT Framework

Project mention: [P] LoopGPT Update - Finally something useful? | /r/MachineLearning | 2023-07-20

So we thought it would be a good idea to create a framework that makes use of LoopGPT agent's memory and custom tooling capabilities. Let's jump right into the new features of this framework.

coffee

4 1,341 8.8 Python

Build and iterate on your UI 10x faster with AI - right from your own IDE ☕️

Project mention: AI Grant Traction in OSS Startups | dev.to | 2024-02-01

Coframe

safe-rlhf

1 1,149 8.3 Python

Safe RLHF: Constrained Value Alignment via Safe Reinforcement Learning from Human Feedback

Project mention: [R] Meet Beaver-7B: a Constrained Value-Aligned LLM via Safe RLHF Technique | /r/MachineLearning | 2023-05-16

LLMStack

20 1,089 9.9 Python

No-code platform to build LLM Agents, workflows and applications with your data

Project mention: Vanna.ai: Chat with your SQL database | news.ycombinator.com | 2024-01-14

We have recently added support to query data from SingleStore to our agent framework, LLMStack (https://github.com/trypromptly/LLMStack). Out of the box performance performance when prompting with just the table schemas is pretty good with GPT-4.
The more domain specific knowledge needed for queries, the harder it has gotten in general. We've had good success `teaching` the model different concepts in relation to the dataset and giving it example questions and queries greatly improved performance.

dstack

15 1,087 9.8 Python

dstack is an open-source orchestration engine for running AI workloads at scale in any cloud or data center. https://discord.gg/u8SmfwPpMd

Project mention: Ask HN: How does deploying a fine-tuned model work | news.ycombinator.com | 2024-04-23

You can use https://github.com/dstackai/dstack to deploy your model to the most affordable GPU clouds. It supports auto-scaling and other features.
Disclaimer: I’m the creator of dstack.

LLMCompiler

2 1,056 7.6 Python

LLMCompiler: An LLM Compiler for Parallel Function Calling

Project mention: FLaNK Weekly 18 Dec 2023 | dev.to | 2023-12-18

llm_agents

6 877 3.5 Python

Build agents which are controlled by LLMs
agenta

8 823 10.0 Python

The all-in-one LLM developer platform: prompt management, evaluation, human feedback, and deployment all in one place.

Project mention: Ask HN: How are you testing your LLM applications? | news.ycombinator.com | 2024-02-06

I am biased, but I would use a platform and not roll your own solution. You will tend to underestimate the depth of capabilities needed for an eval framework.
Now for solutions, shameless plug here, we are building an open-source platform for experimenting and evaluating complex LLM apps (https://github.com/agenta-ai/agenta). We offer automatic evaluators as well as human annotation capabilities. Currently, we only provide testing before deployment, but we have plans to include post-production evaluations as well.
Other tools I would look at in the space are promptfoo (also open-source, more dev oriented), humanloop (one of the most feature complete tools in the space, enterprise oriented), however more enterprise oriented / costly) and vellum (YC company, more focused towards product teams)

distilabel

1 825 9.7 Python

⚗️ distilabel is a framework for synthetic data and AI feedback for AI engineers that require high-quality outputs, full data ownership, and overall efficiency.

Project mention: Open-source AI Feedback framework for scalable LLM Alignment | news.ycombinator.com | 2023-11-23

agent-protocol

4 754 9.2 Python

Common interface for interacting with AI agents. The protocol is tech stack agnostic - you can use it with any framework for building agents.

Project mention: Show HN: Common protocol for communication with (and between) AI Agents | news.ycombinator.com | 2023-08-09

DataDreamer

5 632 8.1 Python

DataDreamer: Prompt. Generate Synthetic Data. Train & Align Models. 🤖💤

Project mention: FLaNK AI - 01 April 2024 | dev.to | 2024-04-01

llmflows

1 615 8.6 Python

LLMFlows - Simple, Explicit and Transparent LLM Apps

Project mention: Show HN: LLMFlows – LangChain alternative for explicit and transparent apps | news.ycombinator.com | 2023-07-29

oterm

1 554 9.3 Python

a text-based terminal client for Ollama

Project mention: term | /r/LocalLLaMA | 2023-10-17

Check it out here

vectordb

6 543 7.6 Python

A minimal Python package for storing and retrieving text using chunking, embeddings, and vector search. (by kagisearch)

Project mention: VectorDB: Vector Database Built by Kagi Search | news.ycombinator.com | 2023-11-26

We needed a low latency, on premise solution that we can run on edge nodes (so lightweight) with sane defaults that anyone in the team can whim in a sec.
Result is this and we constantly benchmark performance of different embeddings to ensure best defaults.
[1] https://github.com/kagisearch/vectordb#embeddings-performanc...

SaaSHub

www.saashub.com sponsored

SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020).

Python llms related posts

Large language models (e.g., ChatGPT) as research assistants
1 project | news.ycombinator.com | 27 Apr 2024
LLM Is a Capable Regressor When Given In-Context Examples
3 projects | news.ycombinator.com | 13 Apr 2024
Show HN: Burr: An OS Framework for Building and Debugging GenAI Apps Faster
2 projects | news.ycombinator.com | 3 Apr 2024
Long-form factuality in large language models
2 projects | news.ycombinator.com | 30 Mar 2024
text-generation-webui VS LibreChat - a user suggested alternative
2 projects | 29 Feb 2024
Validating the RAG Performance of Amazon Titan vs. Cohere Using Amazon Bedrock
1 project | news.ycombinator.com | 9 Feb 2024
Ask HN: What have you built with LLMs?
43 projects | news.ycombinator.com | 5 Feb 2024
A note from our sponsor - InfluxDB
www.influxdata.com | 27 Apr 2024

Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality. Learn more →

Index

What are some of the best open-source llm projects in Python? This list will help you:

	Project	Stars
1	LLaMA-Factory	17,050
2	chroma	12,189
3	LLMSurvey	8,716
4	AutoGPTQ	3,744
5	gpt-code-ui	3,482
6	api-for-open-llm	1,952
7	DemoGPT	1,566
8	swirl-search	1,509
9	langroid	1,509
10	loopgpt	1,390
11	coffee	1,341
12	safe-rlhf	1,149
13	LLMStack	1,089
14	dstack	1,087
15	LLMCompiler	1,056
16	llm_agents	877
17	agenta	823
18	distilabel	825
19	agent-protocol	754
20	DataDreamer	632
21	llmflows	615
22	oterm	554
23	vectordb	543