-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
So I am looking at running this with llama.cpp with metal shaders (Mac M1 Ultra, 128 Gb) but running into a conversion problem. I can get tokenizer.json to tokenizer.model using this method, but can't convert the model to the q4_0 .bin that llama.cpp uses.
Possibly. There's a llama.cpp fork called bloomz.cpp but it's not been updated in 2 months. So it's not going to support any of the fancy new quantisation methods, performance improvements, GPU acceleration, etc.
Hey u/The-Bloke Appreciate the quants! What is the degradation on the some benchmarks. Have you seen https://github.com/EleutherAI/lm-evaluation-harness. 3-bit and 2-bit quant will really be pushing it. I don't see a ton of evaluation results on the quants and nice to see a before and after.
You need my ggml fork until #343 is merged into ggml to use it.
You need my ggml fork until #343 is merged into ggml to use it.