A brief history of LLaMA models

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com

Our great sponsors
  • WorkOS - The modern identity platform for B2B SaaS
  • InfluxDB - Power Real-Time Data Analytics at Scale
  • SaaSHub - Software Alternatives and Reviews
  • LLaMA_MPS

    Discontinued Run LLaMA inference on Apple Silicon GPUs.

  • Most places that recommend llama.cpp for mac fail to mention https://github.com/jankais3r/LLaMA_MPS, which runs unquantized 7b and 13b models on the M1/M2 GPU directly. It's slightly slower, (not a lot), and significantly lower energy usage. To me the win not having to quantize is huge; I wish more people knew about it.

  • llama-dfdx

    LLaMa 7b with CUDA acceleration implemented in rust. Minimal GPU memory needed!

  • There's a rust deep learning library called dfdx that just setup llama: https://github.com/coreylowman/llama-dfdx

  • WorkOS

    The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.

    WorkOS logo
  • mlc-llm

    Enable everyone to develop, optimize and deploy AI models natively on everyone's devices.

  • dalai

    The simplest way to run LLaMA on your local machine

  • I had it running before with Dalai (https://github.com/cocktailpeanut/dalai) but have since moved to using the browser based WebGPU method (https://mlc.ai/web-llm/) which uses Vicuna 7B and is quite good.

  • web-llm

    Bringing large-language models and chat to web browsers. Everything runs inside the browser with no server support.

  • I had it running before with Dalai (https://github.com/cocktailpeanut/dalai) but have since moved to using the browser based WebGPU method (https://mlc.ai/web-llm/) which uses Vicuna 7B and is quite good.

  • text-generation-webui

    A Gradio web UI for Large Language Models. Supports transformers, GPTQ, AWQ, EXL2, llama.cpp (GGUF), Llama models.

  • Is it?

    Literally every example I've seen so far is completely unversioned and mere weeks after being written simply doesn't work as a direct consequence.

    E.g: https://github.com/oobabooga/text-generation-webui/blob/ee68...

    Take this line:

        pip3 install torch torchvision torchaudio

  • peft

    🤗 PEFT: State-of-the-art Parameter-Efficient Fine-Tuning.

  • Wow. Less than half of those have any version specified. The rest? "Meh, I don't care, whatever."

    Then this beauty:

        git+https://github.com/huggingface/peft

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • RedPajama-Data

    The RedPajama-Data repository contains code for preparing large datasets for training large language models.

  • There are efforts to provide an open source replica of the training dataset and independently trained models. So far the dataset has been recreated following the original paper (allowing for some vagueness that Meta researchers didn't specify):

    https://github.com/togethercomputer/RedPajama-Data/

    https://twitter.com/togethercompute/status/16479179892645191...

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts