A simple repo for fine-tuning LLMs with both GPTQ and bitsandbytes quantization. Also supports ExLlama for inference for the best speed.

This page summarizes the projects mentioned and recommended in the original post on /r/LocalLLaMA

Our great sponsors
  • WorkOS - The modern identity platform for B2B SaaS
  • InfluxDB - Power Real-Time Data Analytics at Scale
  • SaaSHub - Software Alternatives and Reviews
  • llm_finetuning

    Convenient wrapper for fine-tuning and inference of Large Language Models (LLMs) with several quantization techniques (GTPQ, bitsandbytes)

  • I also create a short summary at https://github.com/taprosoft/llm_finetuning/blob/main/benchmark/README.md to compare the performance difference between popular quantization techniques. GPTQ seems to hold a good advantage in term of speed in compare to 4-bit quantization from bitsandbytes.

  • alpaca-lora

    Instruct-tune LLaMA on consumer hardware

  • Follow up the popular work of u/tloen alpaca-lora, I wrapped the setup of alpaca_lora_4bit to add support for GPTQ training in form of installable pip packages. You can perform training and inference with multiple quantizations method to compare the results.

  • WorkOS

    The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.

    WorkOS logo
  • alpaca_lora_4bit

  • Follow up the popular work of u/tloen alpaca-lora, I wrapped the setup of alpaca_lora_4bit to add support for GPTQ training in form of installable pip packages. You can perform training and inference with multiple quantizations method to compare the results.

  • exllama

    A more memory-efficient rewrite of the HF transformers implementation of Llama for use with quantized weights.

  • For inference step, this repo can help you to use ExLlama to perform inference on an evaluation dataset for the best throughput.

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts