Llama-2-Onnx
axolotl
Llama-2-Onnx | axolotl | |
---|---|---|
3 | 29 | |
998 | 6,506 | |
1.5% | 10.7% | |
6.7 | 9.8 | |
5 months ago | 7 days ago | |
Python | Python | |
GNU General Public License v3.0 or later | Apache License 2.0 |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
Llama-2-Onnx
-
Show HN: Fine-tune your own Llama 2 to replace GPT-3.5/4
System: Here's some docs, answer concisely in a sentence.
YMMV on cost still, depends on cloud vendor, and my intuition & viewpoint agrees with yours, GPT-3.5 is priced low enough that there isn't a case where it makes sense to use another model.
It strikes me now that _very_ likely and not just our intuition: OpenAI's $/GPU hour is likely <= any other vendor's.
The next big step will come from formalizing the stuff rolling around the local LLM community, for months now it's either been one-off $X.c stunts that run on desktop, and the vast majority of the _actual_ usage and progress is coming from porn-y stuff, like all nascent tech.
Microsoft has LLaMa-2 ONNX available on GitHub[1]. There's budding but very small projects in different languages to wrap ONNX. Once there's a genuine cross-platform[2] ONNX wrapper that makes running LLaMa-2 easy, there will be a step change. It'll be "free"[3] to run your fine-tuned model that does as well as GPT-4 .
It's not clear to me exactly when this will occur. It's "difficult" now, but only because the _actual usage_ in the local LLM community doesn't have a reason to invest in ONNX, and it's extremely intimidating to figure out how exactly to get LLaMa-2 running in ONNX. Microsoft kinda threw it up on GitHub and moved on, the sample code even still needs a PyTorch model. I see at least one very small company on HuggingFace that _may_ have figured out full ONNX.
[1] https://github.com/microsoft/Llama-2-Onnx
- FLaNK Stack Weekly for 14 Aug 2023
- Llama 2 on ONNX runs locally
axolotl
-
Ask HN: Most efficient way to fine-tune an LLM in 2024?
The approach I see used is axolotl with QLoRA using cloud GPUs which can be quite cheap.
https://github.com/OpenAccess-AI-Collective/axolotl
- FLaNK AI - 01 April 2024
-
LoRA from Scratch implementation for LLM finetuning
https://github.com/OpenAccess-AI-Collective/axolotl
- Optimized Triton Kernels for full fine tunes
- Axolotl
-
Let’s Collaborate to Build a High-Quality, Open-Source Dataset for LLMs!
One option is to look at what Axolotl uses. They have a list of different dataset formats that they support. They're mostly in JSON with specific field names, so you could start putting a dataset together with a text editor or a JSON editor.
- Axolotl: Streamline fine-tuning of AI models
-
Dataset Creation Tools?
You can save that overall set into a json file and load it up as training data in whatever you're using. I'm using axolotl for it at the moment. Though a GUI based option is probably best for the first couple of tries until you get a feel for the options.
-
Progress on Reproducing Phi-1/1.5
Looking forward to the results! If it turns out the dataset is reproducible, then it might be a good candidate for ReLora training on axolotl!
What are some alternatives?
vllm - A high-throughput and memory-efficient inference and serving engine for LLMs
signal-cli - signal-cli provides an unofficial commandline, JSON-RPC and dbus interface for the Signal messenger.
pkgx - the last thing you’ll install
gpt-llm-trainer
onnx-coreml - ONNX to Core ML Converter
LoRA - Code for loralib, an implementation of "LoRA: Low-Rank Adaptation of Large Language Models"
awesome-data-temporality - A curated list to help you manage temporal data across many modalities 🚀.
LMFlow - An Extensible Toolkit for Finetuning and Inference of Large Foundation Models. Large Models for All.
OpenPipe - Turn expensive prompts into cheap fine-tuned models
mlc-llm - Universal LLM Deployment Engine with ML Compilation
llama.cpp - LLM inference in C/C++
koboldcpp - A simple one-file way to run various GGML and GGUF models with KoboldAI's UI