Accessing Llama 2 from the command-line with the LLM-replicate plugin

Our great sponsors

InfluxDB - Power Real-Time Data Analytics at Scale

WorkOS - The modern identity platform for B2B SaaS

SaaSHub - Software Alternatives and Reviews

Our great sponsors

llama.cpp

769 56,891 10.0 C++

LLM inference in C/C++

For those getting started, the easiest one click installer I've used is Nomic.ai's gpt4all: https://gpt4all.io/
This runs with a simple GUI on Windows/Mac/Linux, leverages a fork of llama.cpp on the backend and supports GPU acceleration, and LLaMA, Falcon, MPT, and GPT-J models. It also has API/CLI bindings.
I just saw a slick new tool https://ollama.ai/ that will let you install a llama2-7b with a single `ollama run llama2` command that has a very simple 1-click installer for Apple Silicon Mac (but need to build from source for anything else atm). It looks like it only supports llamas OOTB but it also seems to use llama.cpp (via Go adapter) on the backend - it seemed to be CPU-only on my MBA, but I didn't poke too much and it's brand new, so we'll see.
For anyone on HN, they should probably be looking at https://github.com/ggerganov/llama.cpp and https://github.com/ggerganov/ggml directly. If you have a high-end Nvidia consumer card (3090/4090) I'd highly recommend looking into https://github.com/turboderp/exllama
For those generally confused, the r/LocalLLaMA wiki is a good place to start: https://www.reddit.com/r/LocalLLaMA/wiki/guide/
I've also been porting my own notes into a single location that tracks models, evals, and has guides focused on local models: https://llm-tracker.info/

llm

23 2,903 9.5 Python

Access large language models from the command-line (by simonw)

More about my LLM tool (and Python library) here: https://llm.datasette.io/
Here's the full implementation of that llm-replicate plugin: https://github.com/simonw/llm-replicate/blob/0.2/llm_replica...
If you want to write a plugin for some other LLM I have a detailed tutorial here: https://llm.datasette.io/en/stable/plugins/tutorial-model-pl... - plus a bunch of examples linked from here: https://github.com/simonw/llm-plugins

InfluxDB

www.influxdata.com sponsored

Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
text-generation-webui

876 36,293 9.9 Python

A Gradio web UI for Large Language Models. Supports transformers, GPTQ, AWQ, EXL2, llama.cpp (GGUF), Llama models.

The real gold standard is https://github.com/oobabooga/text-generation-webui
Which includes the llama.cpp backend, and a lot more.
Unfortunately, despite claiming to be the "Automatic1111" of text generation, it doesn't support any of the prompt engineering capabilities (i.e. negative prompts, prompt weights, prompt blending, etc) available in Automatic1111, despite the fact that it's not difficult to implement - https://gist.github.com/Hellisotherpeople/45c619ee22aac6865c...

llm-replicate

1 52 6.1 Python

LLM plugin for models hosted on Replicate

More about my LLM tool (and Python library) here: https://llm.datasette.io/
Here's the full implementation of that llm-replicate plugin: https://github.com/simonw/llm-replicate/blob/0.2/llm_replica...
If you want to write a plugin for some other LLM I have a detailed tutorial here: https://llm.datasette.io/en/stable/plugins/tutorial-model-pl... - plus a bunch of examples linked from here: https://github.com/simonw/llm-plugins

llm-plugins

1 39 5.4

The LLM plugins directory

More about my LLM tool (and Python library) here: https://llm.datasette.io/
Here's the full implementation of that llm-replicate plugin: https://github.com/simonw/llm-replicate/blob/0.2/llm_replica...
If you want to write a plugin for some other LLM I have a detailed tutorial here: https://llm.datasette.io/en/stable/plugins/tutorial-model-pl... - plus a bunch of examples linked from here: https://github.com/simonw/llm-plugins

llm-gpt4all

3 176 6.4 Python

Plugin for LLM adding support for the GPT4All collection of models

My LLM tool can be used for both. That's what the plugins are for.
It can talk to OpenAI, PaLM 2 and Llama / other models on Replicate via API, using API keys.
It can run local models on your own machine using these two plugins: https://github.com/simonw/llm-gpt4all and https://github.com/simonw/llm-mpt30b

llm-mpt30b

1 29 3.9 Python

LLM plugin adding support for the MPT-30B language model

My LLM tool can be used for both. That's what the plugins are for.
It can talk to OpenAI, PaLM 2 and Llama / other models on Replicate via API, using API keys.
It can run local models on your own machine using these two plugins: https://github.com/simonw/llm-gpt4all and https://github.com/simonw/llm-mpt30b

WorkOS

workos.com sponsored

The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
guidance

89 12,248 9.5 Jupyter Notebook

Discontinued A guidance language for controlling large language models. [Moved to: https://github.com/guidance-ai/guidance] (by microsoft)

Perhaps something as simple as stating it was first built around OpenAI models and later expanded to local via plugins?
I've been meaning to ask you, have you seen/used MS Guidance[0] 'language' at all? I don't know if it's the right abstraction to interface as a plugin with what you've got in llm cli but there's a lot about Guidance that seems incredibly useful to local inference [token healing and acceleration especially].
[0]https://github.com/microsoft/guidance

ollama

192 58,943 9.9 Go

Get up and running with Llama 3, Mistral, Gemma, and other large language models.

For those getting started, the easiest one click installer I've used is Nomic.ai's gpt4all: https://gpt4all.io/
This runs with a simple GUI on Windows/Mac/Linux, leverages a fork of llama.cpp on the backend and supports GPU acceleration, and LLaMA, Falcon, MPT, and GPT-J models. It also has API/CLI bindings.
I just saw a slick new tool https://ollama.ai/ that will let you install a llama2-7b with a single `ollama run llama2` command that has a very simple 1-click installer for Apple Silicon Mac (but need to build from source for anything else atm). It looks like it only supports llamas OOTB but it also seems to use llama.cpp (via Go adapter) on the backend - it seemed to be CPU-only on my MBA, but I didn't poke too much and it's brand new, so we'll see.
For anyone on HN, they should probably be looking at https://github.com/ggerganov/llama.cpp and https://github.com/ggerganov/ggml directly. If you have a high-end Nvidia consumer card (3090/4090) I'd highly recommend looking into https://github.com/turboderp/exllama
For those generally confused, the r/LocalLLaMA wiki is a good place to start: https://www.reddit.com/r/LocalLLaMA/wiki/guide/
I've also been porting my own notes into a single location that tracks models, evals, and has guides focused on local models: https://llm-tracker.info/

ggml

69 9,642 9.8 C

Tensor library for machine learning

For those getting started, the easiest one click installer I've used is Nomic.ai's gpt4all: https://gpt4all.io/
This runs with a simple GUI on Windows/Mac/Linux, leverages a fork of llama.cpp on the backend and supports GPU acceleration, and LLaMA, Falcon, MPT, and GPT-J models. It also has API/CLI bindings.
I just saw a slick new tool https://ollama.ai/ that will let you install a llama2-7b with a single `ollama run llama2` command that has a very simple 1-click installer for Apple Silicon Mac (but need to build from source for anything else atm). It looks like it only supports llamas OOTB but it also seems to use llama.cpp (via Go adapter) on the backend - it seemed to be CPU-only on my MBA, but I didn't poke too much and it's brand new, so we'll see.
For anyone on HN, they should probably be looking at https://github.com/ggerganov/llama.cpp and https://github.com/ggerganov/ggml directly. If you have a high-end Nvidia consumer card (3090/4090) I'd highly recommend looking into https://github.com/turboderp/exllama
For those generally confused, the r/LocalLLaMA wiki is a good place to start: https://www.reddit.com/r/LocalLLaMA/wiki/guide/
I've also been porting my own notes into a single location that tracks models, evals, and has guides focused on local models: https://llm-tracker.info/

exllama

64 2,582 9.0 Python

A more memory-efficient rewrite of the HF transformers implementation of Llama for use with quantized weights.

For those getting started, the easiest one click installer I've used is Nomic.ai's gpt4all: https://gpt4all.io/
This runs with a simple GUI on Windows/Mac/Linux, leverages a fork of llama.cpp on the backend and supports GPU acceleration, and LLaMA, Falcon, MPT, and GPT-J models. It also has API/CLI bindings.
I just saw a slick new tool https://ollama.ai/ that will let you install a llama2-7b with a single `ollama run llama2` command that has a very simple 1-click installer for Apple Silicon Mac (but need to build from source for anything else atm). It looks like it only supports llamas OOTB but it also seems to use llama.cpp (via Go adapter) on the backend - it seemed to be CPU-only on my MBA, but I didn't poke too much and it's brand new, so we'll see.
For anyone on HN, they should probably be looking at https://github.com/ggerganov/llama.cpp and https://github.com/ggerganov/ggml directly. If you have a high-end Nvidia consumer card (3090/4090) I'd highly recommend looking into https://github.com/turboderp/exllama
For those generally confused, the r/LocalLLaMA wiki is a good place to start: https://www.reddit.com/r/LocalLLaMA/wiki/guide/
I've also been porting my own notes into a single location that tracks models, evals, and has guides focused on local models: https://llm-tracker.info/

gpt4all

139 64,046 9.8 C++

gpt4all: run open-source LLMs anywhere

For those getting started, the easiest one click installer I've used is Nomic.ai's gpt4all: https://gpt4all.io/
This runs with a simple GUI on Windows/Mac/Linux, leverages a fork of llama.cpp on the backend and supports GPU acceleration, and LLaMA, Falcon, MPT, and GPT-J models. It also has API/CLI bindings.
I just saw a slick new tool https://ollama.ai/ that will let you install a llama2-7b with a single `ollama run llama2` command that has a very simple 1-click installer for Apple Silicon Mac (but need to build from source for anything else atm). It looks like it only supports llamas OOTB but it also seems to use llama.cpp (via Go adapter) on the backend - it seemed to be CPU-only on my MBA, but I didn't poke too much and it's brand new, so we'll see.
For anyone on HN, they should probably be looking at https://github.com/ggerganov/llama.cpp and https://github.com/ggerganov/ggml directly. If you have a high-end Nvidia consumer card (3090/4090) I'd highly recommend looking into https://github.com/turboderp/exllama
For those generally confused, the r/LocalLLaMA wiki is a good place to start: https://www.reddit.com/r/LocalLLaMA/wiki/guide/
I've also been porting my own notes into a single location that tracks models, evals, and has guides focused on local models: https://llm-tracker.info/

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project