OpenLLaMA: An Open Reproduction of LLaMA

InfluxDB - Power Real-Time Data Analytics at Scale

Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

www.influxdata.com

featured

SaaSHub - Software Alternatives and Reviews

SaaSHub helps you find the best software and product alternatives

www.saashub.com

featured

open_llama

52 7,211 5.3

OpenLLaMA, a permissively licensed open source reproduction of Meta AI’s LLaMA 7B trained on the RedPajama dataset

How is this model performing better than LLaMa in a lot of tasks[1] even though its trained on a fifth of the data (1 trillion vs 200 billion).
[1]https://github.com/openlm-research/open_llama#evaluation

llama.cpp

778 57,984 10.0 C++

LLM inference in C/C++

I think llama.cop might be easier to set up and get running.
https://github.com/ggerganov/llama.cpp

InfluxDB

www.influxdata.com featured

Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
text-generation-webui

876 36,827 9.9 Python

A Gradio web UI for Large Language Models. Supports transformers, GPTQ, AWQ, EXL2, llama.cpp (GGUF), Llama models.

Nope :P
The absolute most efficient way to run is MLC-LLM. 7B LLaMA modes take about 3.5GB of VRAM (which is very modest), runs Vulkan (so basically any GPU, including laptop IGPs) and is extremely fast: https://github.com/mlc-ai/mlc-llm
The catch? You are stuck with a few prebuilt models... For now. There is a build script to compile models, but I can tell you it is a pain to set up.
LLAma 7B runs on my modest RTX 2060 with ~4.5GB VRAM (or the full 6GB with long inputs) using this: https://github.com/oobabooga/text-generation-webui/blob/main...
This is what I personally use, as the interface is much prettier and fleshed out, and you can use hundreds of llama finetunes from huggingface.
One catch for both is that 4 bit quantization has a modest hit on 7B\13B output quality, but not as much as you’d think.

Open-Llama

7 637 10.0 Python

Discontinued The complete training code of the open-source high-performance Llama model, including the full process from pre-training to RLHF.

Really exciting how fast fully pre-trained new models are appearing.
Here's another repo (with the same "open-llama" name) that has been available on hugging face as well for a few weeks. (different training dataset)
https://github.com/s-JoL/Open-Llama

RWKV-LM

84 11,704 8.8 Python

RWKV is an RNN with transformer-level LLM performance. It can be directly trained like a GPT (parallelizable). So it's combining the best of RNN and transformer - great performance, fast inference, saves VRAM, fast training, "infinite" ctx_len, and free sentence embedding.

Would be very interesting to see https://github.com/BlinkDL/RWKV-LM trained on the same data

EasyLM

8 2,247 7.7 Python

Large language models (LLMs) made easy, EasyLM is a one stop solution for pre-training, finetuning, evaluating and serving LLMs in JAX/Flax.

I am quite new to this, I would like to get it running. Would the process roughly be:
1. Get a machine with decent GPU, probably rent cloud GPU.
2. On that machine download the weights/model/vocab files from https://huggingface.co/openlm-research/open_llama_7b_preview...
3. Install Anaconda. Clone https://github.com/young-geng/EasyLM/.
4. Install EasyLM:
    conda env create -f scripts/gpu_environment.yml

FastChat

83 34,514 9.6 Python

An open platform for training, serving, and evaluating large language models. Release repo for Vicuna and Chatbot Arena.

I second this recommendation to start with llama.cpp. It runs on a regular desktop computer and it gives a sense of what's possible.
If you want access to a serious GPU or TPU, then the sensible solution is to rent one in the cloud. But you can also achieve impressive results on consumer grade gaming hardware.
The FastChat framework supports the Vicuna LLM, along with some others: https://github.com/lm-sys/FastChat
The Oobabooga web interface aims to become the standard interface for chat models: https://github.com/oobabooga/text-generation-webui
I don't see any indication that OpenLLaMa will run on either of those without modification, but either of those, or some other framework may emerge as a de-facto standard for running these models.

SaaSHub

www.saashub.com featured

SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
modal-examples

9 572 9.5 Python

Examples of programs built using Modal

You can get it running with one Python script on Modal.com :)
https://github.com/modal-labs/modal-examples/blob/main/06_gp...

mlc-llm

89 17,150 9.9 Python

Enable everyone to develop, optimize and deploy AI models natively on everyone's devices.

Nope :P
The absolute most efficient way to run is MLC-LLM. 7B LLaMA modes take about 3.5GB of VRAM (which is very modest), runs Vulkan (so basically any GPU, including laptop IGPs) and is extremely fast: https://github.com/mlc-ai/mlc-llm
The catch? You are stuck with a few prebuilt models... For now. There is a build script to compile models, but I can tell you it is a pain to set up.
LLAma 7B runs on my modest RTX 2060 with ~4.5GB VRAM (or the full 6GB with long inputs) using this: https://github.com/oobabooga/text-generation-webui/blob/main...
This is what I personally use, as the interface is much prettier and fleshed out, and you can use hundreds of llama finetunes from huggingface.
One catch for both is that 4 bit quantization has a modest hit on 7B\13B output quality, but not as much as you’d think.

brev-cli

7 197 7.9 Go

Connect your laptop to cloud computers. Follow to stay updated about our product

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Eagle 7B: Soaring past Transformers

2 projects | news.ycombinator.com | 28 Jan 2024
Mixtral in Colab

1 project | news.ycombinator.com | 7 Jan 2024
The Eleuther AI Mafia

2 projects | news.ycombinator.com | 3 Sep 2023
Tiny models for contextually coherent conversations?

5 projects | /r/LocalLLaMA | 14 Jun 2023
[R] RWKV: Reinventing RNNs for the Transformer Era

1 project | /r/MachineLearning | 23 May 2023

OpenLLaMA: An Open Reproduction of LLaMA

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com
language-model Golang Deep Learning llm CLI
Post date: 2 May 2023

open_llama

llama.cpp

InfluxDB

text-generation-webui

Open-Llama

RWKV-LM

EasyLM

FastChat

SaaSHub

mlc-llm

brev-cli

Related posts

Eagle 7B: Soaring past Transformers

Mixtral in Colab

The Eleuther AI Mafia

Tiny models for contextually coherent conversations?

[R] RWKV: Reinventing RNNs for the Transformer Era

OpenLLaMA: An Open Reproduction of LLaMA

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com language-model Golang Deep Learning llm CLI Post date: 2 May 2023

Related posts

Eagle 7B: Soaring past Transformers

Mixtral in Colab

The Eleuther AI Mafia

Tiny models for contextually coherent conversations?

[R] RWKV: Reinventing RNNs for the Transformer Era

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com
language-model Golang Deep Learning llm CLI
Post date: 2 May 2023