llama-cpp-python
LocalAI
Our great sponsors
llama-cpp-python | LocalAI | |
---|---|---|
54 | 82 | |
6,378 | 19,593 | |
- | 12.9% | |
9.9 | 9.9 | |
6 days ago | 4 days ago | |
Python | C++ | |
MIT License | MIT License |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
llama-cpp-python
- FLaNK AI for 11 March 2024
-
OpenAI: Memory and New Controls for ChatGPT
I'll share the core bit that took a while to figure out the right format, my main script is a hot mess using embeddings with SentenceTransformer, so I won't share that yet. E.g: last night I did a PR for llama-cpp-python that shows how Phi might be used with JSON only for the author to write almost exactly the same code at pretty much the same time. https://github.com/abetlen/llama-cpp-python/pull/1184
-
TinyLlama LLM: A Step-by-Step Guide to Implementing the 1.1B Model on Google Colab
Python Bindings for llama.cpp
- Mistral-8x7B-Chat
-
Running Mistral LLM on Apple Silicon Using Apple's MLX Framework Is Much Faster
If the model could be made to work with llama.cpp, then https://github.com/abetlen/llama-cpp-python might be more compact. llama.cpp only supports a limited list of model types though.
- Run ChatGPT-like LLMs on your laptop in 3 lines of code
-
Code Llama, a state-of-the-art large language model for coding
https://github.com/abetlen/llama-cpp-python has a web server mode that replicates openai's API iirc and the readme shows it has docker builds already.
-
Meta: Code Llama, an AI Tool for Coding
LocalAI https://localai.io/ and LMStudio https://lmstudio.ai/ both have fairly complete OpenAI compatibility layers. llama-cpp-python has a FastAPI server as well: https://github.com/abetlen/llama-cpp-python/blob/main/llama_... (as of this moment it hasn't merged GGUF update yet though)
-
First steps with llama
I went with Python, llama-cpp-python, since my goal is just to get a small project up and running locally.
-
Show HN: Khoj – Chat Offline with Your Second Brain Using Llama 2
I see you’re using gpt4all; do you have a supported way to change the model being used for local inference?
A number of apps that are designed for OpenAI’s completion/chat APIs can simply point to the endpoints served by llama-cpp-python [0], and function in (largely) the same way, while supporting the various models and quants supported by llama.cpp. That would allow folks to run larger models on the hardware of their choice (including Apple Silicon with Metal acceleration) or using other proxies like openrouter.io.
[0]: https://github.com/abetlen/llama-cpp-python
LocalAI
- Drop-In Replacement for ChatGPT API
- Voxos.ai – An Open-Source Desktop Voice Assistant
- Ask HN: Set Up Local LLM
- FLaNK Stack Weekly 11 Dec 2023
- Is there any open source app to load a model and expose API like OpenAI?
-
What do you use to run your models?
If you're running this as a server, I would recommend LocalAI https://github.com/mudler/LocalAI
-
OpenAI Switch Kit: Swap OpenAI with any open-source model
LocalAI can do that: https://github.com/mudler/LocalAI
https://localai.io/features/openai-functions/
-
"ChatGPT romanesc"
De inspirație, LocalAI, un replacement la OpenAI. E deja hot pe GitHub.
-
Local LLM's to run on old iMac / Hardware
Your hardware should be fine for inferencing, as long as you don't bother trying to get the GPU working.
My $0.02 would be to try getting LocalAI running on your machine with OpenCL/CLBlas acceleration for your CPU. If you're running other things, you could limit the inferencing process to 2 or 3 threads. That should get it working; I've been able to inference even 13b models on cheap Rockchip SOCs. Your CPU should be fine, even if it's a little outdated.
LocalAI: https://github.com/mudler/LocalAI
Some decent models to start with:
TinyLlama (extremely small/fast): https://huggingface.co/TheBloke/TinyLlama-1.1B-Chat-v0.3-GGU...
Dolphin Mistral (larger size, better responses: https://huggingface.co/TheBloke/dolphin-2.1-mistral-7B-GGUF
-
Retrieval Augmented Generation in Go
Neither of this really requires OpenAI. You can do it with locally-running models via something like https://github.com/mudler/LocalAI
What are some alternatives?
intel-extension-for-pytorch - A Python package for extending the official PyTorch that can easily obtain performance on Intel platform
gpt4all - gpt4all: run open-source LLMs anywhere
llama.cpp - LLM inference in C/C++
ollama - Get up and running with Llama 3, Mistral, Gemma, and other large language models.
text-generation-inference - Large Language Model Text Generation Inference
private-gpt - Interact with your documents using the power of GPT, 100% privately, no data leaks
mlc-llm - Enable everyone to develop, optimize and deploy AI models natively on everyone's devices.
text-generation-webui - A Gradio web UI for Large Language Models. Supports transformers, GPTQ, AWQ, EXL2, llama.cpp (GGUF), Llama models.
FastChat - An open platform for training, serving, and evaluating large language models. Release repo for Vicuna and Chatbot Arena.
KoboldAI
localGPT - Chat with your documents on your local device using GPT models. No data leaves your device and 100% private.