-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
-
serge
A web interface for chatting with Alpaca through llama.cpp. Fully dockerized, with an easy to use API.
I've been playing with https://github.com/ggerganov/llama.cpp recently and was surprised by the litter computing resource it requires. And for people who don't know what that is, it's an implementation of inference of Facebook's LLaMA model in pure C/C++. Most of all, it doesn't require a GPU to run, uses less ram and responds on time compared to running cuda on the CPU.
The first question is that llama.cpp doesn't support GPT-J models, but I found another project from the same author https://github.com/ggerganov/ggml. It includes an example of converting the vanilla GPT-6J model to the ggml format, which is the format that llama.cpp supports. Since Pygmalion-6B was fine-tuned on GPT-6J, I believe it should also work on it.