Llamafile lets you distribute and run LLMs with a single file

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com

Our great sponsors
  • InfluxDB - Power Real-Time Data Analytics at Scale
  • WorkOS - The modern identity platform for B2B SaaS
  • SaaSHub - Software Alternatives and Reviews
  • llamafile

    Distribute and run LLMs with a single file.

  • Sounds like you should download the 4.45MB llamafile-server-0.1 executable from https://github.com/Mozilla-Ocho/llamafile/releases/tag/0.1 and then run it against your existing gguf model files like this:

        ./llamafile-server-0.1 -m llama-2-13b.Q8_0.gguf

  • safetensors

    Simple, safe way to store and distribute tensors

  • The ML field is doing work in that area: https://github.com/huggingface/safetensors

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • llama.cpp

    LLM inference in C/C++

  • I've been playing with various models in llama.cpp's GGUF format like this.

      git clone https://github.com/ggerganov/llama.cpp

  • LLaVA

    [NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.

  • That's not a llamafile thing, that's a llava-v1.5-7b-q4 thing - you're running the LLaVA 1.5 model at a 7 billion parameter size further quantized to 4 bits (the q4).

    GPT4-Vision is running a MUCH larger model than the tiny 7B 4GB LLaVA file in this example.

    LLaVA have a 13B model available which might do better, though there's no chance it will be anywhere near as good as GPT-4 Vision. https://github.com/haotian-liu/LLaVA/blob/main/docs/MODEL_ZO...

  • chatgpt-web

    ChatGPT web interface using the OpenAI API (by Niek)

  • Wow, this is almost as good as chatgpt-web [0], and it works offline and is free. Amazing.

    In case anyone here hasn't used chatgpt-web, I recommend trying it out. With the new GPT-4 models you can chat for way cheaper than paying for ChatGPT Plus, and you can also switch back to the older (non-nerfed) GPT-4 models that can still actually code.

    [0]: https://github.com/Niek/chatgpt-web

  • langchain

    🦜🔗 Build context-aware reasoning applications

  • This comment is now a potential exploit for any such system that encounters it (in practice most won't be fooled by trivial prompt injections, but possibly more complex ones)

    Here's one example I found with a quick search: https://github.com/langchain-ai/langchain/issues/5872

  • llamafile-docker

    Simple llamafile setup with docker

  • Popped it into a docker setup:

    https://github.com/tluyben/llamafile-docker

    to save even more keystrokes.

  • WorkOS

    The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.

    WorkOS logo
  • TinyLlama

    The TinyLlama project is an open endeavor to pretrain a 1.1B Llama model on 3 trillion tokens.

  • Which is a smaller model, that gives good output and that works best with this. I am looking to run this on lower end systems.

    I wonder if someone has already tried https://github.com/jzhang38/TinyLlama, could save me some time :)

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts