A simple repo for fine-tuning LLMs with both GPTQ and bitsandbytes quantization. Also supports ExLlama for inference for the best speed.

Our great sponsors

WorkOS - The modern identity platform for B2B SaaS

InfluxDB - Power Real-Time Data Analytics at Scale

SaaSHub - Software Alternatives and Reviews

Our great sponsors

llm_finetuning

5 129 6.8 Python

Convenient wrapper for fine-tuning and inference of Large Language Models (LLMs) with several quantization techniques (GTPQ, bitsandbytes)

I also create a short summary at https://github.com/taprosoft/llm_finetuning/blob/main/benchmark/README.md to compare the performance difference between popular quantization techniques. GPTQ seems to hold a good advantage in term of speed in compare to 4-bit quantization from bitsandbytes.

alpaca-lora

107 18,167 3.6 Jupyter Notebook

Instruct-tune LLaMA on consumer hardware

Follow up the popular work of u/tloen alpaca-lora, I wrapped the setup of alpaca_lora_4bit to add support for GPTQ training in form of installable pip packages. You can perform training and inference with multiple quantizations method to compare the results.

WorkOS

workos.com sponsored

The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
alpaca_lora_4bit

41 528 8.6 Python

Follow up the popular work of u/tloen alpaca-lora, I wrapped the setup of alpaca_lora_4bit to add support for GPTQ training in form of installable pip packages. You can perform training and inference with multiple quantizations method to compare the results.

exllama

64 2,582 9.0 Python

A more memory-efficient rewrite of the HF transformers implementation of Llama for use with quantized weights.

For inference step, this repo can help you to use ExLlama to perform inference on an evaluation dataset for the best throughput.

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Show HN: Python CLI tool to convert YouTube video with context-aware screenshot
1 project | news.ycombinator.com | 26 Apr 2024
TryAlgo: Algorithmic Problem Solving
1 project | news.ycombinator.com | 26 Apr 2024
Debian Image for SG200x-Based Boards: Milk-V Duo256/DuoS and Sipeed LicheeRVNano
1 project | news.ycombinator.com | 26 Apr 2024
How I search in 2024
1 project | news.ycombinator.com | 26 Apr 2024
License Plate Recognition with Home Assistant, Codeproject.ai, and Frigate NVR
1 project | news.ycombinator.com | 26 Apr 2024

A simple repo for fine-tuning LLMs with both GPTQ and bitsandbytes quantization. Also supports ExLlama for inference for the best speed.

This page summarizes the projects mentioned and recommended in the original post on /r/LocalLLaMA Post date: 7 Jul 2023

llm_finetuning

alpaca-lora

WorkOS

alpaca_lora_4bit

exllama

Related posts