dalai
llama.cpp
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
dalai
-
Ask HN: What are the capabilities of consumer grade hardware to work with LLMs?
I agree, I've definitely seen way more information about running image synthesis models like Stable Diffusion locally than I have LLMs. It's counterintuitive to me that Stable Diffusion takes less RAM than an LLM, especially considering it still needs the word vectors. Goes to show I know nothing.
I guess it comes down to the requirement of a very high end (or multiple) GPU that makes it impractical for most vs just running it in Colab or something.
Tho there are some efforts:
https://github.com/cocktailpeanut/dalai
-
Meta to release open-source commercial AI model
If you're just looking to play with something locally for the first time, this is the simplest project I've found and has a simple web UI: https://github.com/cocktailpeanut/dalai
It works for 7B/13B/30B/65B LLaMA and Alpaca (fine-tuned LLaMA which definitely works better). The smaller models at least should run on pretty much any computer.
- How can I run a large language model locally?
- meirl
-
FreedomGPT: AI with no censorship
I am not against easy mode options dude, for example I used to run GANs through command line. I replaced them with Upscayl when I found it. Convenience is king after all. Something about this one isn't right though. They are advertising it as a model they built meanwhile their own github show it to be a frontend of LLAMA. Why aren't they honest about it? Why use bots to spam about it? This causes me to not trust the executable they share to 1 to 1 compliation of the source code neither. I would still recommend looking for more decent alternatives. Btw, running it directly isn't that complicated
-
Google removes the waitlist on Bard today and will be available in 180 more countries
https://github.com/ggerganov/llama.cpp https://github.com/oobabooga/text-generation-webui https://github.com/mlc-ai/mlc-llm https://github.com/cocktailpeanut/dalai https://github.com/ido-pluto/catai (this is super easy to install but it doesnt provide an api or have integration with langchain)
-
ChatGPT Data Breach BreakDown - Why it Should be a Concern for Everyone!
This was easy to get running: https://github.com/cocktailpeanut/dalai with alpaca 13B (on my 16GB or ram)
-
A brief history of LLaMA models
I had it running before with Dalai (https://github.com/cocktailpeanut/dalai) but have since moved to using the browser based WebGPU method (https://mlc.ai/web-llm/) which uses Vicuna 7B and is quite good.
-
Meet Atom the GPT Assistant, an AI-powered Smart Home Assistant. It's like Google Assistant but with endless possibility of ChatGPT, it's like Siri but with extensibility of Open Source power.
https://github.com/nsarrazin/serge let's you pick which model and runs in a container. For API https://github.com/cocktailpeanut/dalai looks super promising.
- Mercredi Tech - 2023-04-26
llama.cpp
-
Better and Faster Large Language Models via Multi-Token Prediction
For anyone interested in exploring this, llama.cpp has an example implementation here:
https://github.com/ggerganov/llama.cpp/tree/master/examples/...
- Llama.cpp Bfloat16 Support
-
Fine-tune your first large language model (LLM) with LoRA, llama.cpp, and KitOps in 5 easy steps
Getting started with LLMs can be intimidating. In this tutorial we will show you how to fine-tune a large language model using LoRA, facilitated by tools like llama.cpp and KitOps.
- GGML Flash Attention support merged into llama.cpp
-
Phi-3 Weights Released
well https://github.com/ggerganov/llama.cpp/issues/6849
- Lossless Acceleration of LLM via Adaptive N-Gram Parallel Decoding
- Llama.cpp Working on Support for Llama3
-
Embeddings are a good starting point for the AI curious app developer
Have just done this recently for local chat with pdf feature in https://recurse.chat. (It's a macOS app that has built-in llama.cpp server and local vector database)
Running an embedding server locally is pretty straightforward:
- Get llama.cpp release binary: https://github.com/ggerganov/llama.cpp/releases
- Mixtral 8x22B
- Llama.cpp: Improve CPU prompt eval speed
What are some alternatives?
gpt4all - gpt4all: run open-source LLMs anywhere
ollama - Get up and running with Llama 3, Mistral, Gemma, and other large language models.
text-generation-webui - A Gradio web UI for Large Language Models. Supports transformers, GPTQ, AWQ, EXL2, llama.cpp (GGUF), Llama models.
llama - Inference code for Llama models
alpaca-lora - Instruct-tune LLaMA on consumer hardware
GPTQ-for-LLaMa - 4 bits quantization of LLaMA using GPTQ
FastChat - An open platform for training, serving, and evaluating large language models. Release repo for Vicuna and Chatbot Arena.
ggml - Tensor library for machine learning
alpaca.cpp - Locally run an Instruction-Tuned Chat-Style LLM