Top 23 Python llm Projects

MetaGPT

32 38,728 10.0 Python

🌟 The Multi-Agent Framework: First AI Software Company, Towards Natural Language Programming

Project mention: Can AI replace a co-founder? | news.ycombinator.com | 2024-01-07

https://github.com/geekan/MetaGPT :
> MetaGPT takes a one line requirement as input and outputs user stories / competitive analysis / requirements / data structures / APIs / documents, etc.
https://news.ycombinator.com/item?id=29141796 ; "Co-Founder Equity Calculator"
"Ask HN: What are your go to SaaS products for startups/MVPs?" (2020) https://news.ycombinator.com/item?id=23535828 ; FounderKit, StackShare
> USA Small Business Administration: "10 steps to start your business." https://www.sba.gov/starting-business/how-start-business/10-...
>> "Startup Incorporation Checklist: How to bootstrap a Delaware C-corp (or S-corp) with employee(s) in California" https://github.com/leonar15/startup-checklist

llama_index

75 30,639 10.0 Python

LlamaIndex is a data framework for your LLM applications

Project mention: LlamaIndex: A data framework for your LLM applications | news.ycombinator.com | 2024-04-07

InfluxDB

www.influxdata.com sponsored

Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
MindsDB

78 21,160 10.0 Python

The platform for customizing AI from enterprise data

Project mention: What’s the Difference Between Fine-tuning, Retraining, and RAG? | dev.to | 2024-04-08

Check us out on GitHub.

unilm

40 18,262 9.0 Python

Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities

Project mention: The Era of 1-Bit LLMs: Training_Tips, Code And_FAQ [pdf] | news.ycombinator.com | 2024-03-21

vllm

30 17,656 9.9 Python

A high-throughput and memory-efficient inference and serving engine for LLMs

Project mention: Mistral AI Launches New 8x22B Moe Model | news.ycombinator.com | 2024-04-09

The easiest is to use vllm (https://github.com/vllm-project/vllm) to run it on a Couple of A100's, and you can benchmark this using this library (https://github.com/EleutherAI/lm-evaluation-harness)

Chinese-LLaMA-Alpaca

4 17,140 8.8 Python

中文LLaMA&Alpaca大语言模型+本地CPU/GPU训练部署 (Chinese LLaMA & Alpaca LLMs)

Project mention: Chinese-Alpaca-Plus-13B-GPTQ | /r/LocalLLaMA | 2023-05-30

I'd like to share with you today the Chinese-Alpaca-Plus-13B-GPTQ model, which is the GPTQ format quantised 4bit models of Yiming Cui's Chinese-LLaMA-Alpaca 13B for GPU reference.

mlc-llm

89 16,622 9.9 Python

Enable everyone to develop, optimize and deploy AI models natively on everyone's devices.

Project mention: FLaNK 04 March 2024 | dev.to | 2024-03-04

WorkOS

workos.com sponsored

The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
LLaMA-Factory

2 16,319 9.9 Python

Unify Efficient Fine-Tuning of 100+ LLMs

Project mention: Show HN: GPU Prices on eBay | news.ycombinator.com | 2024-02-23

Depends what model you want to train, and how well you want your computer to keep working while you're doing it.
If you're interested in large language models there's a table of vram requirements for fine-tuning at [1] which says you could do the most basic type of fine-tuning on a 7B parameter model with 8GB VRAM.
You'll find that training takes quite a long time, and as a lot of the GPU power is going on training, your computer's responsiveness will suffer - even basic things like scrolling in your web browser or changing tabs uses the GPU, after all.
Spend a bit more and you'll probably have a better time.
[1] https://github.com/hiyouga/LLaMA-Factory?tab=readme-ov-file#...

ChatGLM2-6B

4 15,442 7.0 Python

ChatGLM2-6B: An Open Bilingual Chat LLM | 开源双语对话语言模型

Project mention: Are We Overlooking China's Progress in AI? | /r/singularity | 2023-06-26

peft

26 13,670 9.7 Python

🤗 PEFT: State-of-the-art Parameter-Efficient Fine-Tuning.

Project mention: LoftQ: LoRA-fine-tuning-aware Quantization | news.ycombinator.com | 2023-12-19

ludwig

3 10,778 9.5 Python

Low-code framework for building custom LLMs, neural networks, and other AI models

Project mention: Show HN: Toolkit for LLM Fine-Tuning, Ablating and Testing | news.ycombinator.com | 2024-04-07

This is a great project, little bit similar to https://github.com/ludwig-ai/ludwig, but it includes testing capabilities and ablation.
questions regarding the LLM testing aspect: How extensive is the test coverage for LLM use cases, and what is the current state of this project area? Do you offer any guarantees, or is it considered an open-ended problem?
Would love to see more progress toward this area!

Qwen

5 10,691 9.5 Python

The official repo of Qwen (通义千问) chat & pretrained large language model proposed by Alibaba Cloud.

Project mention: What the heck is so great about this model? | /r/SillyTavernAI | 2023-12-07

Qwen: https://github.com/QwenLM/Qwen

h2ogpt

28 10,327 10.0 Python

Private chat with local GPT with document, images, video, etc. 100% private, Apache 2.0. Supports oLLaMa, Mixtral, llama.cpp, and more. Demo: https://gpt.h2o.ai/ https://codellama.h2o.ai/

Project mention: Ask HN: How do I train a custom LLM/ChatGPT on my own documents in Dec 2023? | news.ycombinator.com | 2023-12-24

As others have said you want RAG.
The most feature complete implementation I've seen is h2ogpt[0] (not affiliated).
The code is kind of a mess (most of the logic is in an ~8000 line python file) but it supports ingestion of everything from YouTube videos to docx, pdf, etc - either offline or from the web interface. It uses langchain and a ton of additional open source libraries under the hood. It can run directly on Linux, via docker, or with one-click installers for Mac and Windows.
It has various model hosting implementations built in - transformers, exllama, llama.cpp as well as support for model serving frameworks like vLLM, HF TGI, etc or just OpenAI.
You can also define your preferred embedding model along with various other parameters but I've found the out of box defaults to be pretty sane and usable.
[0] - https://github.com/h2oai/h2ogpt

gorilla

50 9,945 8.4 Python

Gorilla: An API store for LLMs

Project mention: Autonomous LLM agents with human-out-of-loop | news.ycombinator.com | 2024-04-11

ml-engineering

9 9,680 9.8 Python

Machine Learning Engineering Open Book

Project mention: Accelerators | news.ycombinator.com | 2024-02-22

OpenLLM

25 8,671 9.9 Python

Run any open-source LLMs, such as Llama 2, Mistral, as OpenAI compatible API endpoint, locally and in the cloud.

Project mention: First 15 Open Source Advent projects | dev.to | 2023-12-15

13. OpenLLM by BentoML | Github | tutorial

LLMSurvey

3 8,515 7.9 Python

The official GitHub page for the survey paper "A Survey of Large Language Models".

Project mention: Ask HN: Textbook Regarding LLMs | news.ycombinator.com | 2024-03-23

Here’s another one - it’s older but has some interesting charts and graphs.
https://arxiv.org/abs/2303.18223

embedchain

6 8,392 9.8 Python

Personalizing LLM Responses

Project mention: Ask HN: How do I train a custom LLM/ChatGPT on my own documents in Dec 2023? | news.ycombinator.com | 2023-12-24

You can use embedchain[1] to connect various data sources and then get a RAG application running on your local and production very easily. Embedchain is an open source RAG framework and It follows a conventional but configurable approach.
The conventional approach is suitable for software engineer where they may not be less familiar with AI. The configurable approach is suitable for ML engineer where they have sophisticated uses and would want to configure chunking, indexing and retrieval strategies.
[1]: https://github.com/embedchain/embedchain

nebuly

105 8,368 8.4 Python

The user analytics platform for LLMs

Project mention: Nebuly – The LLM Analytics Platform | news.ycombinator.com | 2023-10-07

shell_gpt

38 8,208 7.9 Python

A command-line productivity tool powered by AI large language models like GPT-4, will help you accomplish your tasks faster and more efficiently.

Project mention: Oh My Zsh | news.ycombinator.com | 2024-01-22

https://github.com/TheR1D/shell_gpt?tab=readme-ov-file#shell...

promptflow

5 7,951 9.9 Python

Build high-quality LLM apps - from prototyping, testing to production deployment and monitoring.

Project mention: A suite of tools designed to streamline the development cycle of LLM-based apps | news.ycombinator.com | 2024-04-12

deeplake

13 7,673 9.8 Python

Database for AI. Store Vectors, Images, Texts, Videos, etc. Use with LLMs/LangChain. Store, query, version, & visualize any AI data. Stream data in real-time to PyTorch/TensorFlow. https://activeloop.ai

Project mention: FLaNK AI Weekly 25 March 2025 | dev.to | 2024-03-25

txtai

354 6,910 9.3 Python

💡 All-in-one open-source embeddings database for semantic search, LLM orchestration and language model workflows

Project mention: Build knowledge graphs with LLM-driven entity extraction | dev.to | 2024-02-21

txtai is an all-in-one embeddings database for semantic search, LLM orchestration and language model workflows.

SaaSHub

www.saashub.com sponsored

SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020). The latest post mention was on 2024-04-12.

Python llm related posts

Meta Llama 3
10 projects | news.ycombinator.com | 18 Apr 2024
Yes, Python and Matplotlib can make pretty charts
3 projects | news.ycombinator.com | 16 Apr 2024
LLM Is a Capable Regressor When Given In-Context Examples
3 projects | news.ycombinator.com | 13 Apr 2024
A suite of tools designed to streamline the development cycle of LLM-based apps
1 project | news.ycombinator.com | 12 Apr 2024
Autonomous LLM agents with human-out-of-loop
1 project | news.ycombinator.com | 11 Apr 2024
PullRequestBenchmark Challenge: Can AI Replace Your Dev Team?
1 project | news.ycombinator.com | 10 Apr 2024
Mistral AI Launches New 8x22B Moe Model
4 projects | news.ycombinator.com | 9 Apr 2024
A note from our sponsor - InfluxDB
www.influxdata.com | 19 Apr 2024

Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality. Learn more →

Index

What are some of the best open-source llm projects in Python? This list will help you:

	Project	Stars
1	MetaGPT	38,728
2	llama_index	30,639
3	MindsDB	21,160
4	unilm	18,262
5	vllm	17,656
6	Chinese-LLaMA-Alpaca	17,140
7	mlc-llm	16,622
8	LLaMA-Factory	16,319
9	ChatGLM2-6B	15,442
10	peft	13,670
11	ludwig	10,778
12	Qwen	10,691
13	h2ogpt	10,327
14	gorilla	9,945
15	ml-engineering	9,680
16	OpenLLM	8,671
17	LLMSurvey	8,515
18	embedchain	8,392
19	nebuly	8,368
20	shell_gpt	8,208
21	promptflow	7,951
22	deeplake	7,673
23	txtai	6,910