Python quantization

Open-source Python projects categorized as quantization

Top 23 Python quantization Projects

  • Chinese-LLaMA-Alpaca

    中文LLaMA&Alpaca大语言模型+本地CPU/GPU训练部署 (Chinese LLaMA & Alpaca LLMs)

  • Project mention: Chinese-Alpaca-Plus-13B-GPTQ | /r/LocalLLaMA | 2023-05-30

    I'd like to share with you today the Chinese-Alpaca-Plus-13B-GPTQ model, which is the GPTQ format quantised 4bit models of Yiming Cui's Chinese-LLaMA-Alpaca 13B for GPU reference.

  • LLaMA-Factory

    Unify Efficient Fine-Tuning of 100+ LLMs

  • Project mention: Show HN: GPU Prices on eBay | news.ycombinator.com | 2024-02-23

    Depends what model you want to train, and how well you want your computer to keep working while you're doing it.

    If you're interested in large language models there's a table of vram requirements for fine-tuning at [1] which says you could do the most basic type of fine-tuning on a 7B parameter model with 8GB VRAM.

    You'll find that training takes quite a long time, and as a lot of the GPU power is going on training, your computer's responsiveness will suffer - even basic things like scrolling in your web browser or changing tabs uses the GPU, after all.

    Spend a bit more and you'll probably have a better time.

    [1] https://github.com/hiyouga/LLaMA-Factory?tab=readme-ov-file#...

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • faster-whisper

    Faster Whisper transcription with CTranslate2

  • Project mention: Using Groq to Build a Real-Time Language Translation App | dev.to | 2024-04-05

    For our real-time STT needs, we'll employ a fantastic library called faster-whisper.

  • AutoGPTQ

    An easy-to-use LLMs quantization package with user-friendly apis, based on GPTQ algorithm.

  • Project mention: Setting up LLAMA2 70B Chat locally | /r/developersIndia | 2023-08-18
  • Pretrained-Language-Model

    Pretrained language model and its related optimization techniques developed by Huawei Noah's Ark Lab.

  • Project mention: Does anyone know a downloadable chatgpt model that supports conversation in Albanian? | /r/Programimi | 2023-05-16
  • deepsparse

    Sparsity-aware deep learning inference runtime for CPUs

  • Project mention: Fast Llama 2 on CPUs with Sparse Fine-Tuning and DeepSparse | news.ycombinator.com | 2023-11-23

    Interesting company. Yannic Kilcher interviewed Nir Shavit last year and they went into some depth: https://www.youtube.com/watch?v=0PAiQ1jTN5k DeepSparse is on GitHub: https://github.com/neuralmagic/deepsparse

  • xTuring

    Build, customize and control you own LLMs. From data pre-processing to fine-tuning, xTuring provides an easy way to personalize open-source LLMs. Join our discord community: https://discord.gg/TgHXuSJEk6

  • Project mention: I'm developing an open-source AI tool called xTuring, enabling anyone to construct a Language Model with just 5 lines of code. I'd love to hear your thoughts! | /r/machinelearningnews | 2023-09-07

    Explore the project on GitHub here.

  • WorkOS

    The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.

    WorkOS logo
  • mixtral-offloading

    Run Mixtral-8x7B models in Colab or consumer desktops

  • Project mention: DBRX: A New Open LLM | news.ycombinator.com | 2024-03-27

    Waiting for Mixed Quantization with MQQ and MoE Offloading [1]. With that I was able to run Mistral 8x7B on my 10 GB VRAM rtx3080... This should work for DBRX and should shave off a ton of VRAM requirement.

    1. https://github.com/dvmazur/mixtral-offloading?tab=readme-ov-...

  • optimum

    🚀 Accelerate training and inference of 🤗 Transformers and 🤗 Diffusers with easy to use hardware optimization tools

  • Project mention: FastEmbed: Fast and Lightweight Embedding Generation for Text | dev.to | 2024-02-02

    Shout out to Huggingface's Optimum – which made it easier to quantize models.

  • neural-compressor

    SOTA low-bit LLM quantization (INT8/FP8/INT4/FP4/NF4) & sparsity; leading model compression techniques on TensorFlow, PyTorch, and ONNX Runtime

  • aimet

    AIMET is a library that provides advanced quantization and compression techniques for trained neural network models.

  • model-optimization

    A toolkit to optimize ML models for deployment for Keras and TensorFlow, including quantization and pruning.

  • mmrazor

    OpenMMLab Model Compression Toolbox and Benchmark.

  • intel-extension-for-pytorch

    A Python package for extending the official PyTorch that can easily obtain performance on Intel platform

  • Project mention: Efficient LLM inference solution on Intel GPU | news.ycombinator.com | 2024-01-20

    OK I found it. Looks like they use SYCL (which for some reason they've rebranded to DPC++): https://github.com/intel/intel-extension-for-pytorch/tree/v2...

  • nncf

    Neural Network Compression Framework for enhanced OpenVINO™ inference

  • Project mention: FLaNK Stack Weekly 06 Nov 2023 | dev.to | 2023-11-06
  • finn

    Dataflow compiler for QNN inference on FPGAs

  • Project mention: Hi, What could be the best HLS tool for implementing neural networks on FPGA | /r/FPGA | 2023-06-13

    FINN - https://github.com/Xilinx/finn

  • SqueezeLLM

    SqueezeLLM: Dense-and-Sparse Quantization

  • Project mention: Llama33B vs Falcon40B vs MPT30B | /r/LocalLLaMA | 2023-07-05

    Using the currently popular gptq the 3bit quantization hurts performance much more than 4bit, but there's also awq (https://github.com/mit-han-lab/llm-awq) and squishllm (https://github.com/SqueezeAILab/SqueezeLLM) which are able to manage 3bit without as much performance drop - I hope to see them used more commonly.

  • quanto

    A pytorch Quantization Toolkit

  • Project mention: FLaNK AI-April 22, 2024 | dev.to | 2024-04-22
  • fastT5

    ⚡ boost inference speed of T5 models by 5x & reduce the model size by 3x.

  • qkeras

    QKeras: a quantization deep learning library for Tensorflow Keras

  • hqq

    Official implementation of Half-Quadratic Quantization (HQQ)

  • Project mention: Half-Quadratic Quantization of Large Machine Learning Models | news.ycombinator.com | 2024-03-14
  • llama.onnx

    LLaMa/RWKV onnx models, quantization and testcase

  • Project mention: Qnap TS-264 | /r/LocalLLaMA | 2023-06-29

    You can find LLM models in the onnx format here: https://github.com/tpoisonooo/llama.onnx

  • Sparsebit

    A model compression and acceleration toolbox based on pytorch.

  • SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

    SaaSHub logo
NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020).

Python quantization related posts

Index

What are some of the best open-source quantization projects in Python? This list will help you:

Project Stars
1 Chinese-LLaMA-Alpaca 17,251
2 LLaMA-Factory 17,050
3 faster-whisper 8,723
4 AutoGPTQ 3,744
5 Pretrained-Language-Model 2,956
6 deepsparse 2,873
7 xTuring 2,515
8 mixtral-offloading 2,230
9 optimum 2,141
10 neural-compressor 1,950
11 aimet 1,908
12 model-optimization 1,465
13 mmrazor 1,365
14 intel-extension-for-pytorch 1,342
15 nncf 777
16 finn 661
17 SqueezeLLM 566
18 quanto 552
19 fastT5 540
20 qkeras 522
21 hqq 409
22 llama.onnx 323
23 Sparsebit 319

Sponsored
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com