OpenLLaMA: An Open Reproduction of LLaMA

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com

InfluxDB - Power Real-Time Data Analytics at Scale
Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
www.influxdata.com
featured
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com
featured
  • open_llama

    OpenLLaMA, a permissively licensed open source reproduction of Meta AI’s LLaMA 7B trained on the RedPajama dataset

  • How is this model performing better than LLaMa in a lot of tasks[1] even though its trained on a fifth of the data (1 trillion vs 200 billion).

    [1]https://github.com/openlm-research/open_llama#evaluation

  • llama.cpp

    LLM inference in C/C++

  • I think llama.cop might be easier to set up and get running.

    https://github.com/ggerganov/llama.cpp

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • text-generation-webui

    A Gradio web UI for Large Language Models. Supports transformers, GPTQ, AWQ, EXL2, llama.cpp (GGUF), Llama models.

  • Nope :P

    The absolute most efficient way to run is MLC-LLM. 7B LLaMA modes take about 3.5GB of VRAM (which is very modest), runs Vulkan (so basically any GPU, including laptop IGPs) and is extremely fast: https://github.com/mlc-ai/mlc-llm

    The catch? You are stuck with a few prebuilt models... For now. There is a build script to compile models, but I can tell you it is a pain to set up.

    LLAma 7B runs on my modest RTX 2060 with ~4.5GB VRAM (or the full 6GB with long inputs) using this: https://github.com/oobabooga/text-generation-webui/blob/main...

    This is what I personally use, as the interface is much prettier and fleshed out, and you can use hundreds of llama finetunes from huggingface.

    One catch for both is that 4 bit quantization has a modest hit on 7B\13B output quality, but not as much as you’d think.

  • Open-Llama

    Discontinued The complete training code of the open-source high-performance Llama model, including the full process from pre-training to RLHF.

  • Really exciting how fast fully pre-trained new models are appearing.

    Here's another repo (with the same "open-llama" name) that has been available on hugging face as well for a few weeks. (different training dataset)

    https://github.com/s-JoL/Open-Llama

  • RWKV-LM

    RWKV is an RNN with transformer-level LLM performance. It can be directly trained like a GPT (parallelizable). So it's combining the best of RNN and transformer - great performance, fast inference, saves VRAM, fast training, "infinite" ctx_len, and free sentence embedding.

  • Would be very interesting to see https://github.com/BlinkDL/RWKV-LM trained on the same data

  • EasyLM

    Large language models (LLMs) made easy, EasyLM is a one stop solution for pre-training, finetuning, evaluating and serving LLMs in JAX/Flax.

  • I am quite new to this, I would like to get it running. Would the process roughly be:

    1. Get a machine with decent GPU, probably rent cloud GPU.

    2. On that machine download the weights/model/vocab files from https://huggingface.co/openlm-research/open_llama_7b_preview...

    3. Install Anaconda. Clone https://github.com/young-geng/EasyLM/.

    4. Install EasyLM:

        conda env create -f scripts/gpu_environment.yml

  • FastChat

    An open platform for training, serving, and evaluating large language models. Release repo for Vicuna and Chatbot Arena.

  • I second this recommendation to start with llama.cpp. It runs on a regular desktop computer and it gives a sense of what's possible.

    If you want access to a serious GPU or TPU, then the sensible solution is to rent one in the cloud. But you can also achieve impressive results on consumer grade gaming hardware.

    The FastChat framework supports the Vicuna LLM, along with some others: https://github.com/lm-sys/FastChat

    The Oobabooga web interface aims to become the standard interface for chat models: https://github.com/oobabooga/text-generation-webui

    I don't see any indication that OpenLLaMa will run on either of those without modification, but either of those, or some other framework may emerge as a de-facto standard for running these models.

  • SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

    SaaSHub logo
  • You can get it running with one Python script on Modal.com :)

    https://github.com/modal-labs/modal-examples/blob/main/06_gp...

  • mlc-llm

    Enable everyone to develop, optimize and deploy AI models natively on everyone's devices.

  • Nope :P

    The absolute most efficient way to run is MLC-LLM. 7B LLaMA modes take about 3.5GB of VRAM (which is very modest), runs Vulkan (so basically any GPU, including laptop IGPs) and is extremely fast: https://github.com/mlc-ai/mlc-llm

    The catch? You are stuck with a few prebuilt models... For now. There is a build script to compile models, but I can tell you it is a pain to set up.

    LLAma 7B runs on my modest RTX 2060 with ~4.5GB VRAM (or the full 6GB with long inputs) using this: https://github.com/oobabooga/text-generation-webui/blob/main...

    This is what I personally use, as the interface is much prettier and fleshed out, and you can use hundreds of llama finetunes from huggingface.

    One catch for both is that 4 bit quantization has a modest hit on 7B\13B output quality, but not as much as you’d think.

  • brev-cli

    Connect your laptop to cloud computers. Follow to stay updated about our product

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts