Top 6 efficient-inference Open-Source Projects

Efficient-AI-Backbones

3 3,804 5.8 Python

Efficient AI Backbones including GhostNet, TNT and MLP, developed by Huawei Noah's Ark Lab.
LLMCompiler

2 1,069 7.6 Python

LLMCompiler: An LLM Compiler for Parallel Function Calling

Project mention: FLaNK Weekly 18 Dec 2023 | dev.to | 2023-12-18

InfluxDB

www.influxdata.com featured

Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
EfficientFormer

2 944 3.3 Python

EfficientFormerV2 [ICCV 2023] & EfficientFormer [NeurIPs 2022]

Project mention: A look at Apple’s new Transformer-powered predictive text model | news.ycombinator.com | 2023-09-16

I'm pretty fatigued on constantly providing references and sources in this thread but an example of what they've made availably publicly:
https://github.com/snap-research/EfficientFormer

DeepCache

1 603 8.9 Python

[CVPR 2024] DeepCache: Accelerating Diffusion Models for Free

Project mention: DeepCache: Accelerating Diffusion Models for Free | news.ycombinator.com | 2023-12-05

SqueezeLLM

5 569 6.9 Python

[ICML 2024] SqueezeLLM: Dense-and-Sparse Quantization

Project mention: Llama33B vs Falcon40B vs MPT30B | /r/LocalLLaMA | 2023-07-05

Using the currently popular gptq the 3bit quantization hurts performance much more than 4bit, but there's also awq (https://github.com/mit-han-lab/llm-awq) and squishllm (https://github.com/SqueezeAILab/SqueezeLLM) which are able to manage 3bit without as much performance drop - I hope to see them used more commonly.

KVQuant

1 190 5.9 Python

KVQuant: Towards 10 Million Context Length LLM Inference with KV Cache Quantization

Project mention: 10M Tokens LLM Context | news.ycombinator.com | 2024-02-02

NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020).

efficient-inference related posts

Llama33B vs Falcon40B vs MPT30B

2 projects | /r/LocalLLaMA | 5 Jul 2023
Has anyone tried out Squeezellm?

1 project | /r/LocalLLaMA | 2 Jul 2023
SqueezeLLM: Dense-and-Sparse Quantization

1 project | news.ycombinator.com | 15 Jun 2023
New quantization method SqueezeLLM allows for loseless compression for 3-bit and outperforms GPTQ and AWQ in both 3-bit and 4-bit. Quantized Vicuna and LLaMA models have been released.

2 projects | /r/LocalLLaMA | 14 Jun 2023
Researchers From China Introduce Vision GNN (ViG): A Graph Neural Network For Computer Vision Systems

1 project | /r/machinelearningnews | 8 Jun 2022
GNN for computer vision, beating CNN & Transformer

1 project | /r/deeplearning | 4 Jun 2022
A note from our sponsor - InfluxDB
www.influxdata.com | 2 May 2024

Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality. Learn more →

Index

What are some of the best open-source efficient-inference projects? This list will help you:

	Project	Stars
1	Efficient-AI-Backbones	3,804
2	LLMCompiler	1,069
3	EfficientFormer	944
4	DeepCache	603
5	SqueezeLLM	569
6	KVQuant	190

efficient-inference

Top 6 efficient-inference Open-Source Projects

Efficient-AI-Backbones

LLMCompiler

InfluxDB

EfficientFormer

DeepCache

SqueezeLLM

KVQuant

efficient-inference related posts

Llama33B vs Falcon40B vs MPT30B

Has anyone tried out Squeezellm?

SqueezeLLM: Dense-and-Sparse Quantization

New quantization method SqueezeLLM allows for loseless compression for 3-bit and outperforms GPTQ and AWQ in both 3-bit and 4-bit. Quantized Vicuna and LLaMA models have been released.

Researchers From China Introduce Vision GNN (ViG): A Graph Neural Network For Computer Vision Systems

GNN for computer vision, beating CNN &amp; Transformer

Index

GNN for computer vision, beating CNN & Transformer