The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning. Learn more →
Top 23 Python quantization Projects
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
-
Pretrained-Language-Model
Pretrained language model and its related optimization techniques developed by Huawei Noah's Ark Lab.
-
xTuring
Build, customize and control you own LLMs. From data pre-processing to fine-tuning, xTuring provides an easy way to personalize open-source LLMs. Join our discord community: https://discord.gg/TgHXuSJEk6
-
WorkOS
The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
-
optimum
🚀 Accelerate training and inference of 🤗 Transformers and 🤗 Diffusers with easy to use hardware optimization tools
-
neural-compressor
SOTA low-bit LLM quantization (INT8/FP8/INT4/FP4/NF4) & sparsity; leading model compression techniques on TensorFlow, PyTorch, and ONNX Runtime
-
aimet
AIMET is a library that provides advanced quantization and compression techniques for trained neural network models.
-
model-optimization
A toolkit to optimize ML models for deployment for Keras and TensorFlow, including quantization and pruning.
-
intel-extension-for-pytorch
A Python package for extending the official PyTorch that can easily obtain performance on Intel platform
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
I'd like to share with you today the Chinese-Alpaca-Plus-13B-GPTQ model, which is the GPTQ format quantised 4bit models of Yiming Cui's Chinese-LLaMA-Alpaca 13B for GPU reference.
Depends what model you want to train, and how well you want your computer to keep working while you're doing it.
If you're interested in large language models there's a table of vram requirements for fine-tuning at [1] which says you could do the most basic type of fine-tuning on a 7B parameter model with 8GB VRAM.
You'll find that training takes quite a long time, and as a lot of the GPU power is going on training, your computer's responsiveness will suffer - even basic things like scrolling in your web browser or changing tabs uses the GPU, after all.
Spend a bit more and you'll probably have a better time.
[1] https://github.com/hiyouga/LLaMA-Factory?tab=readme-ov-file#...
For our real-time STT needs, we'll employ a fantastic library called faster-whisper.
Project mention: Does anyone know a downloadable chatgpt model that supports conversation in Albanian? | /r/Programimi | 2023-05-16
Project mention: Fast Llama 2 on CPUs with Sparse Fine-Tuning and DeepSparse | news.ycombinator.com | 2023-11-23Interesting company. Yannic Kilcher interviewed Nir Shavit last year and they went into some depth: https://www.youtube.com/watch?v=0PAiQ1jTN5k DeepSparse is on GitHub: https://github.com/neuralmagic/deepsparse
Project mention: I'm developing an open-source AI tool called xTuring, enabling anyone to construct a Language Model with just 5 lines of code. I'd love to hear your thoughts! | /r/machinelearningnews | 2023-09-07Explore the project on GitHub here.
Waiting for Mixed Quantization with MQQ and MoE Offloading [1]. With that I was able to run Mistral 8x7B on my 10 GB VRAM rtx3080... This should work for DBRX and should shave off a ton of VRAM requirement.
1. https://github.com/dvmazur/mixtral-offloading?tab=readme-ov-...
Project mention: FastEmbed: Fast and Lightweight Embedding Generation for Text | dev.to | 2024-02-02Shout out to Huggingface's Optimum – which made it easier to quantize models.
OK I found it. Looks like they use SYCL (which for some reason they've rebranded to DPC++): https://github.com/intel/intel-extension-for-pytorch/tree/v2...
Project mention: Hi, What could be the best HLS tool for implementing neural networks on FPGA | /r/FPGA | 2023-06-13FINN - https://github.com/Xilinx/finn
Using the currently popular gptq the 3bit quantization hurts performance much more than 4bit, but there's also awq (https://github.com/mit-han-lab/llm-awq) and squishllm (https://github.com/SqueezeAILab/SqueezeLLM) which are able to manage 3bit without as much performance drop - I hope to see them used more commonly.
Project mention: Half-Quadratic Quantization of Large Machine Learning Models | news.ycombinator.com | 2024-03-14
You can find LLM models in the onnx format here: https://github.com/tpoisonooo/llama.onnx
Python quantization related posts
- Apple Explores Home Robotics as Potential 'Next Big Thing'
- Half-Quadratic Quantization of Large Machine Learning Models
- New Mixtral HQQ Quantzied 4-bit/2-bit configuration
- [D] Which framework do you use for applying post-training quantization on image classification models?
- Half-Quadratic Quantization of Large Machine Learning Models
- Now I Can Just Print That Video
- Ask HN: Cheapest way to run local LLMs?
-
A note from our sponsor - WorkOS
workos.com | 26 Apr 2024
Index
What are some of the best open-source quantization projects in Python? This list will help you:
Project | Stars | |
---|---|---|
1 | Chinese-LLaMA-Alpaca | 17,251 |
2 | LLaMA-Factory | 17,050 |
3 | faster-whisper | 8,723 |
4 | AutoGPTQ | 3,744 |
5 | Pretrained-Language-Model | 2,956 |
6 | deepsparse | 2,873 |
7 | xTuring | 2,515 |
8 | mixtral-offloading | 2,230 |
9 | optimum | 2,141 |
10 | neural-compressor | 1,950 |
11 | aimet | 1,908 |
12 | model-optimization | 1,465 |
13 | mmrazor | 1,365 |
14 | intel-extension-for-pytorch | 1,342 |
15 | nncf | 777 |
16 | finn | 661 |
17 | SqueezeLLM | 566 |
18 | quanto | 552 |
19 | fastT5 | 540 |
20 | qkeras | 522 |
21 | hqq | 409 |
22 | llama.onnx | 323 |
23 | Sparsebit | 319 |
Sponsored