alpaca-lora
llama.cpp
Our great sponsors
alpaca-lora | llama.cpp | |
---|---|---|
107 | 744 | |
18,073 | 53,471 | |
- | - | |
3.6 | 9.9 | |
about 1 month ago | 6 days ago | |
Jupyter Notebook | C++ | |
Apache License 2.0 | MIT License |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
alpaca-lora
-
How to Finetune Llama 2: A Beginner's Guide
In this blog post, I want to make it as simple as possible to fine-tune the LLaMA 2 - 7B model, using as little code as possible. We will be using the Alpaca Lora Training script, which automates the process of fine-tuning the model and for GPU we will be using Beam.
-
Fine-tuning LLMs with LoRA: A Gentle Introduction
Implement the code in Llama LoRA repo in a script we can run locally
-
A simple repo for fine-tuning LLMs with both GPTQ and bitsandbytes quantization. Also supports ExLlama for inference for the best speed.
Follow up the popular work of u/tloen alpaca-lora, I wrapped the setup of alpaca_lora_4bit to add support for GPTQ training in form of installable pip packages. You can perform training and inference with multiple quantizations method to compare the results.
- FLaNK Stack Weekly for 20 June 2023
-
Learning sources on working with local LLMs
Read the paper and also: https://github.com/tloen/alpaca-lora
-
Oobabooga for Windows
Guide and Alpaca-Lora
- samantha-7b
- Creating a LoRA from unstructured text
-
With a single 3090, which model is finetune-able and decent reasoning ability
Well, I've not gone through the whole process till the end yet, but using instructions from https://github.com/tloen/alpaca-lora I was able just now to start a fine-tuning process on a LLaMA 13B model, it says it will take 15 hours.
-
[D] An ELI5 explanation for LoRA - Low-Rank Adaptation.
Repos like https://github.com/tloen/alpaca-lora and https://github.com/Lightning-AI/lit-llama use LoRA as a method to fine-tune LLaMA models.
llama.cpp
-
"The king is dead"–Claude 3 surpasses GPT-4 on Chatbot Arena
git clone https://github.com/ggerganov/llama.cpp
-
LLMs on your local Computer (Part 1)
git clone --depth=1 https://github.com/ggerganov/llama.cpp cd llama.cpp mkdir build cd build cmake .. cmake --build . --config Release wget -c --show-progress -o models/llama-2-13b.Q4_0.gguf https://huggingface.co/TheBloke/Llama-2-13B-GGUF/resolve/main/llama-2-13b.Q4_0.gguf?download=true
-
Show HN: Tech Jobs on the Command Line
I'm using https://github.com/ggerganov/llama.cpp and currently mistral 7b (on a m1 macbook pro). I'm sure with some prompt examples you can get pretty good results on a smaller model.
At the moment I don't have it open sourced due to it being part of a larger project that I'm working on that contains tailwindui licensed components.
A cool feature that I'm working on is creating a firefox plugin so you can save/index job postings from other sites and extract out meta information via an LLM. Very similar to this chrome plugin.
-
GGUF, the Long Way Around
Thank you for the reference to the CUDA file [1]. It's always nice to see how complex data structures are handled in GPUs. Does anyone have any idea what the bit patterns are for (starting at line 1529)?
[1] https://github.com/ggerganov/llama.cpp/blob/master/ggml-cuda...
-
The Era of 1-bit LLMs: ternary parameters for cost-effective computing
It does result in a significant degradation relative to unquantized model of the same size, but even with simple llama.cpp K-quantization, it's still worth it all the way down to 2-bit. The chart in this llama.cpp PR speaks for itself:
https://github.com/ggerganov/llama.cpp/pull/1684#issue-17396...
-
Gemma: New Open Models
It should be possible to run it via llama.cpp[0] now.
-
Ollama is now available on Windows in preview
If you just check out https://github.com/ggerganov/llama.cpp and run make, you’ll wind up with an executable called ‘main’ that lets you run any gguf language model you choose. Then:
./main -m ./models/30B/llama-30b.Q4_K_M.gguf --prompt “say hello”
On my M2 MacBook, the first run takes a few seconds before it produces anything, but after that subsequent runs start outputting tokens immediately.
You can run LLM models right inside a short lived process.
But the majority of humans don’t want to use a single execution of a command line to access LLM completions. They want to run a program that lets them interact with an LLM. And to do that they will likely start and leave running a long-lived process with UI state - which can also serve as a host for a longer lived LLM context.
Neither usecase particularly seems to need a server to function. My curiosity about why people are packaging these things up like that is completely genuine.
-
UC Berkley: World Model on Million-Length Video and Language with RingAttention
https://github.com/ggerganov/llama.cpp/discussions/2948
You can run ollama (and a web UI) pretty trivially via docker:
docker run -d --gpus=all -v /some/dir/for/ollama/data:/root/.ollama -p 11434:11434 --name ollama ollama/ollama:latest
- FLaNK Stack Weekly 12 February 2024
-
Ask HN: Are there any reliable benchmarks for Machine Learning Model Serving?
Not exactly what you’re looking fir, but oerhaps you’ll find it useful - llama benchmarked on all M-series chips, and in comments there are comparisons with nvidia.
What are some alternatives?
ollama - Get up and running with Llama 2, Mistral, Gemma, and other large language models.
gpt4all - gpt4all: run open-source LLMs anywhere
text-generation-webui - A Gradio web UI for Large Language Models. Supports transformers, GPTQ, AWQ, EXL2, llama.cpp (GGUF), Llama models.
GPTQ-for-LLaMa - 4 bits quantization of LLaMA using GPTQ
ggml - Tensor library for machine learning
alpaca.cpp - Locally run an Instruction-Tuned Chat-Style LLM
FastChat - An open platform for training, serving, and evaluating large language models. Release repo for Vicuna and Chatbot Arena.
rust-gpu - 🐉 Making Rust a first-class language and ecosystem for GPU shaders 🚧
ChatGLM-6B - ChatGLM-6B: An Open Bilingual Dialogue Language Model | 开源双语对话语言模型
safetensors - Simple, safe way to store and distribute tensors
AutoGPT - AutoGPT is the vision of accessible AI for everyone, to use and to build on. Our mission is to provide the tools, so that you can focus on what matters.