-
exllama
A more memory-efficient rewrite of the HF transformers implementation of Llama for use with quantized weights.
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
-
exllama
A more memory-efficient rewrite of the HF transformers implementation of Llama for use with quantized weights. (by 0cc4m)
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
For exllama (https://github.com/turboderp/exllama) the instructions are on the post itself.
Just about any llama-based model can be run purely on your CPU, or split between your CPU and GPU. Download KoboldCPP, assign as many layers to your GPU as it can handle, and let the CPU and system RAM handle the rest.
Specifically with this project: https://github.com/nomic-ai/gpt4all