[P] fastLLaMa, A python wrapper to run llama.cpp

Our great sponsors

InfluxDB - Power Real-Time Data Analytics at Scale

WorkOS - The modern identity platform for B2B SaaS

SaaSHub - Software Alternatives and Reviews

Our great sponsors

alpaca-lora

107 18,167 3.6 Jupyter Notebook

Instruct-tune LLaMA on consumer hardware

Amazing work! Any plans to get this working with alpaca-lora? Would love to see your fast implementation have the improved outputs from Alpaca.

fastLLaMa

6 402 7.1 C

fastLLaMa: An experimental high-performance framework for running Decoder-only LLMs with 4-bit quantization in Python using a C/C++ backend.

Repo Link

InfluxDB

www.influxdata.com sponsored

Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
llama.cpp

769 55,846 10.0 C++

LLM inference in C/C++

llama.cpp as of very recently support `--perplexity`, which calculates the perplexity over the prompt (and you can just pass wikitext2 test set as the prompt for example). See https://github.com/ggerganov/llama.cpp/pull/270

llama.py

1 28 7.5 C

Python bindings to llama.cpp

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project