-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
-
exllama
A more memory-efficient rewrite of the HF transformers implementation of Llama for use with quantized weights.
-
text-generation-webui
A Gradio web UI for Large Language Models. Supports transformers, GPTQ, AWQ, EXL2, llama.cpp (GGUF), Llama models.
Transformers: the biggest and most famous library for running Large Language Models, and possibly one of the oldest. It was created by a company called Hugging Face, which is where we usually download our models from. It supports many models and has many features but it's slow and wastes GPU memory. https://github.com/huggingface/transformers
AutoGPTQ: an attempt at standardizing GPTQ-for-LLaMa and turning it into a library that is easier to install and use, and that supports more models. https://github.com/PanQiWei/AutoGPTQ
ExLlama: a meticulously optimized library for running GPTQ models. The author is very knowledgeable in low-level GPU programming, and the result is an implementation that is VERY fast and uses much less memory than GPTQ-for-LLaMa or AutoGPTQ. https://github.com/turboderp/exllama
ExLlama_HF: a way to use ExLlama as if it was a transformers model. Transformers implements many parameters like top_k, top_p, etc, that this library reuses without any modifications. It was contributed in a recent PR by Larryvrh: https://github.com/oobabooga/text-generation-webui/pull/2777