Top 23 Python fine-tuning Projects

llama_index

75 30,910 10.0 Python

LlamaIndex is a data framework for your LLM applications

Project mention: LlamaIndex: A data framework for your LLM applications | news.ycombinator.com | 2024-04-07

LLaMA-Factory

2 17,050 9.9 Python

Unify Efficient Fine-Tuning of 100+ LLMs

Project mention: Show HN: GPU Prices on eBay | news.ycombinator.com | 2024-02-23

Depends what model you want to train, and how well you want your computer to keep working while you're doing it.
If you're interested in large language models there's a table of vram requirements for fine-tuning at [1] which says you could do the most basic type of fine-tuning on a 7B parameter model with 8GB VRAM.
You'll find that training takes quite a long time, and as a lot of the GPU power is going on training, your computer's responsiveness will suffer - even basic things like scrolling in your web browser or changing tabs uses the GPU, after all.
Spend a bit more and you'll probably have a better time.
[1] https://github.com/hiyouga/LLaMA-Factory?tab=readme-ov-file#...

InfluxDB

www.influxdata.com sponsored

Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
ludwig

3 10,778 9.5 Python

Low-code framework for building custom LLMs, neural networks, and other AI models

Project mention: Show HN: Toolkit for LLM Fine-Tuning, Ablating and Testing | news.ycombinator.com | 2024-04-07

This is a great project, little bit similar to https://github.com/ludwig-ai/ludwig, but it includes testing capabilities and ablation.
questions regarding the LLM testing aspect: How extensive is the test coverage for LLM use cases, and what is the current state of this project area? Do you offer any guarantees, or is it considered an open-ended problem?
Would love to see more progress toward this area!

xTuring

31 2,515 8.4 Python

Build, customize and control you own LLMs. From data pre-processing to fine-tuning, xTuring provides an easy way to personalize open-source LLMs. Join our discord community: https://discord.gg/TgHXuSJEk6

Project mention: I'm developing an open-source AI tool called xTuring, enabling anyone to construct a Language Model with just 5 lines of code. I'd love to hear your thoughts! | /r/machinelearningnews | 2023-09-07

Explore the project on GitHub here.

YiVal

2 2,425 9.7 Python

Your Automatic Prompt Engineering Assistant for GenAI Applications

Project mention: YiVal——Unlocking Your Data's Power to Create Customized GenAI Apps | /r/u_YiVal | 2023-11-16

- 🤖Github:https://github.com/YiVal/YiVal/pull/189

custom-diffusion

11 1,776 5.6 Python

Custom Diffusion: Multi-Concept Customization of Text-to-Image Diffusion (CVPR 2023)
finetuner

36 1,423 5.5 Python

:dart: Task-oriented embedding tuning for BERT, CLIP, etc.
WorkOS

workos.com sponsored

The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
OneTrainer

3 1,076 9.7 Python

OneTrainer is a one-stop solution for all your stable diffusion training needs.

Project mention: Now You Can Full Fine Tune / DreamBooth Stable Diffusion XL (SDXL) with only 10.3 GB VRAM via OneTrainer | dev.to | 2024-03-25

Used SG161222/RealVisXL_V4.0 as a base model and OneTrainer to train on Windows 10 : https://github.com/Nerogar/OneTrainer

TencentPretrain

1 975 7.8 Python

Tencent Pre-training framework in PyTorch & Pre-trained Model Zoo
LLM-Adapters

2 936 7.3 Python

Code for our EMNLP 2023 Paper: "LLM-Adapters: An Adapter Family for Parameter-Efficient Fine-Tuning of Large Language Models"

Project mention: Google DeepMind CEO Says Some Form of AGI Possible in a Few Years | /r/singularity | 2023-05-03

That is not true, you can for example use an additional adapter to optimize, that takes 50$ and a 1 hour. https://github.com/AGI-Edgerunners/LLM-Adapters

SPIN

1 773 8.6 Python

The official implementation of Self-Play Fine-Tuning (SPIN) (by uclaml)

Project mention: FLaNK Stack Weekly 19 Feb 2024 | dev.to | 2024-02-19

Lora-for-Diffusers

2 696 1.0 Python

The most easy-to-understand tutorial for using LoRA (Low-Rank Adaptation) within diffusers framework for AI Generation Researchers🔥
LLM-Finetuning-Toolkit

1 659 9.6 Python

Toolkit for fine-tuning, ablating and unit-testing open-source LLMs.

Project mention: Show HN: Toolkit for LLM Fine-Tuning, Ablating and Testing | news.ycombinator.com | 2024-04-07

DataDreamer

5 632 8.1 Python

DataDreamer: Prompt. Generate Synthetic Data. Train & Align Models. 🤖💤

Project mention: FLaNK AI - 01 April 2024 | dev.to | 2024-04-01

slowllama

4 413 8.3 Python

Finetune llama2-70b and codellama on MacBook Air without quantization

Project mention: AI — weekly megathread! | /r/artificial | 2023-10-15

slowllama: Finetune llama2-70b and codellama on MacBook Air without quantization [Link].

simpleT5

2 383 2.5 Python

simpleT5 is built on top of PyTorch-lightning⚡️ and Transformers🤗 that lets you quickly train your T5 models.
fondant

4 316 9.7 Python

Production-ready data processing made easy and shareable

Project mention: 25 million Creative Commons image dataset released! | /r/StableDiffusion | 2023-10-01

Github: https://github.com/ml6team/fondant

OneDiffusion

1 315 7.3 Python

OneDiffusion: Run any Stable Diffusion models and fine-tuned weights with ease

Project mention: OneDiffusion | news.ycombinator.com | 2023-08-22

kiri

12 240 3.2 Python

Backprop makes it simple to use, finetune, and deploy state-of-the-art ML models. (by kiri-ai)
Dreambooth

2 94 0.0 Python

Fine-tuning of diffusion models
penzai

3 72 5.4 Python

A JAX research toolkit for building, editing, and visualizing neural networks.

Project mention: FLaNK AI-April 22, 2024 | dev.to | 2024-04-22

praetor-data

5 63 6.4 Python

Praetor is a lightweight finetuning data and prompt management tool
discus

1 62 7.7 Python

A data-centric AI package for ML/AI. Get the best high-quality data for the best results. Discord: https://discord.gg/t6ADqBKrdZ

Project mention: an open source package helping developers generate data for LLMs | /r/mlops | 2023-08-02

SaaSHub

www.saashub.com sponsored

SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020).

Python fine-tuning related posts

penzai: JAX research toolkit for building, editing, and visualizing neural nets
4 projects | news.ycombinator.com | 21 Apr 2024
Show HN: Toolkit for LLM Fine-Tuning, Ablating and Testing
2 projects | news.ycombinator.com | 7 Apr 2024
LlamaIndex: A data framework for your LLM applications
1 project | news.ycombinator.com | 7 Apr 2024
LlamaIndex is a data framework for your LLM applications
1 project | news.ycombinator.com | 28 Mar 2024
Now You Can Full Fine Tune / DreamBooth Stable Diffusion XL (SDXL) with only 10.3 GB VRAM via OneTrainer
1 project | dev.to | 25 Mar 2024
Geniusrise – Wannabe Competitor to Vertex AI, Azure AI Studio and Bedrock
1 project | news.ycombinator.com | 15 Mar 2024
Show HN: GPU Prices on eBay
1 project | news.ycombinator.com | 23 Feb 2024
A note from our sponsor - InfluxDB
www.influxdata.com | 23 Apr 2024

Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality. Learn more →

Index

What are some of the best open-source fine-tuning projects in Python? This list will help you:

	Project	Stars
1	llama_index	30,910
2	LLaMA-Factory	17,050
3	ludwig	10,778
4	xTuring	2,515
5	YiVal	2,425
6	custom-diffusion	1,776
7	finetuner	1,423
8	OneTrainer	1,076
9	TencentPretrain	975
10	LLM-Adapters	936
11	SPIN	773
12	Lora-for-Diffusers	696
13	LLM-Finetuning-Toolkit	659
14	DataDreamer	632
15	slowllama	413
16	simpleT5	383
17	fondant	316
18	OneDiffusion	315
19	kiri	240
20	Dreambooth	94
21	penzai	72
22	praetor-data	63
23	discus	62