llama.cpp
dalai
Our great sponsors
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
llama.cpp
-
"The king is dead"–Claude 3 surpasses GPT-4 on Chatbot Arena
git clone https://github.com/ggerganov/llama.cpp
-
LLMs on your local Computer (Part 1)
git clone --depth=1 https://github.com/ggerganov/llama.cpp cd llama.cpp mkdir build cd build cmake .. cmake --build . --config Release wget -c --show-progress -o models/llama-2-13b.Q4_0.gguf https://huggingface.co/TheBloke/Llama-2-13B-GGUF/resolve/main/llama-2-13b.Q4_0.gguf?download=true
-
Show HN: Tech Jobs on the Command Line
I'm using https://github.com/ggerganov/llama.cpp and currently mistral 7b (on a m1 macbook pro). I'm sure with some prompt examples you can get pretty good results on a smaller model.
At the moment I don't have it open sourced due to it being part of a larger project that I'm working on that contains tailwindui licensed components.
A cool feature that I'm working on is creating a firefox plugin so you can save/index job postings from other sites and extract out meta information via an LLM. Very similar to this chrome plugin.
-
GGUF, the Long Way Around
Thank you for the reference to the CUDA file [1]. It's always nice to see how complex data structures are handled in GPUs. Does anyone have any idea what the bit patterns are for (starting at line 1529)?
[1] https://github.com/ggerganov/llama.cpp/blob/master/ggml-cuda...
-
The Era of 1-bit LLMs: ternary parameters for cost-effective computing
It does result in a significant degradation relative to unquantized model of the same size, but even with simple llama.cpp K-quantization, it's still worth it all the way down to 2-bit. The chart in this llama.cpp PR speaks for itself:
https://github.com/ggerganov/llama.cpp/pull/1684#issue-17396...
-
Gemma: New Open Models
It should be possible to run it via llama.cpp[0] now.
-
Ollama is now available on Windows in preview
If you just check out https://github.com/ggerganov/llama.cpp and run make, you’ll wind up with an executable called ‘main’ that lets you run any gguf language model you choose. Then:
./main -m ./models/30B/llama-30b.Q4_K_M.gguf --prompt “say hello”
On my M2 MacBook, the first run takes a few seconds before it produces anything, but after that subsequent runs start outputting tokens immediately.
You can run LLM models right inside a short lived process.
But the majority of humans don’t want to use a single execution of a command line to access LLM completions. They want to run a program that lets them interact with an LLM. And to do that they will likely start and leave running a long-lived process with UI state - which can also serve as a host for a longer lived LLM context.
Neither usecase particularly seems to need a server to function. My curiosity about why people are packaging these things up like that is completely genuine.
-
UC Berkley: World Model on Million-Length Video and Language with RingAttention
https://github.com/ggerganov/llama.cpp/discussions/2948
You can run ollama (and a web UI) pretty trivially via docker:
docker run -d --gpus=all -v /some/dir/for/ollama/data:/root/.ollama -p 11434:11434 --name ollama ollama/ollama:latest
- FLaNK Stack Weekly 12 February 2024
-
Ask HN: Are there any reliable benchmarks for Machine Learning Model Serving?
Not exactly what you’re looking fir, but oerhaps you’ll find it useful - llama benchmarked on all M-series chips, and in comments there are comparisons with nvidia.
dalai
-
Meta to release open-source commercial AI model
If you're just looking to play with something locally for the first time, this is the simplest project I've found and has a simple web UI: https://github.com/cocktailpeanut/dalai
It works for 7B/13B/30B/65B LLaMA and Alpaca (fine-tuned LLaMA which definitely works better). The smaller models at least should run on pretty much any computer.
-
FreedomGPT: AI with no censorship
I am not against easy mode options dude, for example I used to run GANs through command line. I replaced them with Upscayl when I found it. Convenience is king after all. Something about this one isn't right though. They are advertising it as a model they built meanwhile their own github show it to be a frontend of LLAMA. Why aren't they honest about it? Why use bots to spam about it? This causes me to not trust the executable they share to 1 to 1 compliation of the source code neither. I would still recommend looking for more decent alternatives. Btw, running it directly isn't that complicated
-
Google removes the waitlist on Bard today and will be available in 180 more countries
https://github.com/ggerganov/llama.cpp https://github.com/oobabooga/text-generation-webui https://github.com/mlc-ai/mlc-llm https://github.com/cocktailpeanut/dalai https://github.com/ido-pluto/catai (this is super easy to install but it doesnt provide an api or have integration with langchain)
-
ChatGPT Data Breach BreakDown - Why it Should be a Concern for Everyone!
This was easy to get running: https://github.com/cocktailpeanut/dalai with alpaca 13B (on my 16GB or ram)
-
A brief history of LLaMA models
I had it running before with Dalai (https://github.com/cocktailpeanut/dalai) but have since moved to using the browser based WebGPU method (https://mlc.ai/web-llm/) which uses Vicuna 7B and is quite good.
-
Meet Atom the GPT Assistant, an AI-powered Smart Home Assistant. It's like Google Assistant but with endless possibility of ChatGPT, it's like Siri but with extensibility of Open Source power.
https://github.com/nsarrazin/serge let's you pick which model and runs in a container. For API https://github.com/cocktailpeanut/dalai looks super promising.
- Mercredi Tech - 2023-04-26
- [Chat Gpt] Metas LLaMA LLM ist durchgesickert – Führen Sie unzensierte KI auf Ihrem Heim-PC aus!
-
Newbie , installed dalai with llama locally, trying to make sense of responses
So I am a newbie to using GPT ( I am reasonably technical and comfortable with open source, linux and coding in general). I wanted to just play around with a chatgpt-like system locally to learn more. I have a relatively beefy gaming machine so I installed dalai and llama 7B on it. Link: https://github.com/cocktailpeanut/dalai and https://medium.com/@martin-thissen/llama-alpaca-chatgpt-on-your-local-computer-tutorial-17adda704c23
- Running oobabooga with Alpaca on Apple Silicon (M1/M2)
What are some alternatives?
ollama - Get up and running with Llama 2, Mistral, Gemma, and other large language models.
gpt4all - gpt4all: run open-source LLMs anywhere
text-generation-webui - A Gradio web UI for Large Language Models. Supports transformers, GPTQ, AWQ, EXL2, llama.cpp (GGUF), Llama models.
GPTQ-for-LLaMa - 4 bits quantization of LLaMA using GPTQ
ggml - Tensor library for machine learning
alpaca.cpp - Locally run an Instruction-Tuned Chat-Style LLM
FastChat - An open platform for training, serving, and evaluating large language models. Release repo for Vicuna and Chatbot Arena.
rust-gpu - 🐉 Making Rust a first-class language and ecosystem for GPU shaders 🚧
ChatGLM-6B - ChatGLM-6B: An Open Bilingual Dialogue Language Model | 开源双语对话语言模型
safetensors - Simple, safe way to store and distribute tensors
AutoGPT - AutoGPT is the vision of accessible AI for everyone, to use and to build on. Our mission is to provide the tools, so that you can focus on what matters.
alpaca-lora - Instruct-tune LLaMA on consumer hardware