Llamafile lets you distribute and run LLMs with a single file

Our great sponsors

InfluxDB - Power Real-Time Data Analytics at Scale

WorkOS - The modern identity platform for B2B SaaS

SaaSHub - Software Alternatives and Reviews

Our great sponsors

llamafile

34 13,765 9.6 C++

Distribute and run LLMs with a single file.

Sounds like you should download the 4.45MB llamafile-server-0.1 executable from https://github.com/Mozilla-Ocho/llamafile/releases/tag/0.1 and then run it against your existing gguf model files like this:
    ./llamafile-server-0.1 -m llama-2-13b.Q8_0.gguf

safetensors

31 2,442 8.2 Python

Simple, safe way to store and distribute tensors

The ML field is doing work in that area: https://github.com/huggingface/safetensors

InfluxDB

www.influxdata.com sponsored

Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
llama.cpp

769 56,891 10.0 C++

LLM inference in C/C++

I've been playing with various models in llama.cpp's GGUF format like this.
  git clone https://github.com/ggerganov/llama.cpp

LLaVA

20 16,101 9.4 Python

[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.

That's not a llamafile thing, that's a llava-v1.5-7b-q4 thing - you're running the LLaVA 1.5 model at a 7 billion parameter size further quantized to 4 bits (the q4).
GPT4-Vision is running a MUCH larger model than the tiny 7B 4GB LLaVA file in this example.
LLaVA have a 13B model available which might do better, though there's no chance it will be anywhere near as good as GPT-4 Vision. https://github.com/haotian-liu/LLaVA/blob/main/docs/MODEL_ZO...

chatgpt-web

10 1,693 9.5 Svelte

ChatGPT web interface using the OpenAI API (by Niek)

Wow, this is almost as good as chatgpt-web [0], and it works offline and is free. Amazing.
In case anyone here hasn't used chatgpt-web, I recommend trying it out. With the new GPT-4 models you can chat for way cheaper than paying for ChatGPT Plus, and you can also switch back to the older (non-nerfed) GPT-4 models that can still actually code.
[0]: https://github.com/Niek/chatgpt-web

langchain

31 83,220 10.0 Python

🦜🔗 Build context-aware reasoning applications

This comment is now a potential exploit for any such system that encounters it (in practice most won't be fooled by trivial prompt injections, but possibly more complex ones)
Here's one example I found with a quick search: https://github.com/langchain-ai/langchain/issues/5872

llamafile-docker

1 32 4.7 Dockerfile

Simple llamafile setup with docker

Popped it into a docker setup:
https://github.com/tluyben/llamafile-docker
to save even more keystrokes.

WorkOS

workos.com sponsored

The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
TinyLlama

14 6,772 8.8 Python

The TinyLlama project is an open endeavor to pretrain a 1.1B Llama model on 3 trillion tokens.

Which is a smaller model, that gives good output and that works best with this. I am looking to run this on lower end systems.
I wonder if someone has already tried https://github.com/jzhang38/TinyLlama, could save me some time :)

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project