-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
I noticed there's a couple of open issues on llama.cpp investigating quality issues. It's interesting if a wrong implementation still generates plausible output. It sounds like an objective quality metric would help track down issues.
https://github.com/ggerganov/llama.cpp/issues/129
https://github.com/ggerganov/llama.cpp/issues/173
There's a script in the alpaca-lora repo for converting the weights back into a PyTorch dump- and my changes have since been merged https://github.com/tloen/alpaca-lora/pull/19
>they perform a little worse.
Be aware that LoRA performs on-par or better than fine-tuning in model quality if trained correctly as the paper shows: https://arxiv.org/abs/2106.09685
I ran it on a 128 RAM machine with a Ryzen 5950X. It's not fast, 4 seconds per token. But it's just about fits without swapping. https://github.com/Noeda/rllama/
I have not, but I want to in near future because I'm really curious myself too. I've been following Rust community that now has llama.cpp port and also my OpenCL thing and one discussion item has been to run a verification and common benchmark for the implementations. https://github.com/setzer22/llama-rs/issues/4
I've mostly heard that, at least for the larger models, quantization has barely any noticeable effect.
Georgi rewrote the code on top of his own tensor library (ggml[0]).
[0] https://github.com/ggerganov/ggml
Stanford released the exact training data as well as the training script with all parameters. Boot up a p4.2xlarge (8 A100 GPUs) which costs about $40/hour and let it run for a 2-3 hours and voila. See the Readme in their repo where it mentions the fine-tuning script[0]
[0] https://github.com/tatsu-lab/stanford_alpaca