-
fastLLaMa
fastLLaMa: An experimental high-performance framework for running Decoder-only LLMs with 4-bit quantization in Python using a C/C++ backend.
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
Why this repo and how are we different from other wrappers? Previously someone had asked this in the other post. Thought I would address it here as well. I am really excited to see many people building on top of llama.cpp and I think it deserves all the credit that it is getting. It's inspiring to see how it is shaping out to be a mature framework. However we decided to not simply build the same features in python, but instead focus of features that tackle problems that I personally face at my day job where I run mid to large sized models in production. A lot of the features might or might not make sense to the main repo but we are always looking for features that we can implement in the main repo as it benefits the community as a whole. Here is a more detailed answer if anyone is interested.