[R] Stanford-Alpaca 7B model (an instruction tuned version of LLaMA) performs as well as text-davinci-003

InfluxDB - Power Real-Time Data Analytics at Scale

Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

www.influxdata.com

featured

SaaSHub - Software Alternatives and Reviews

SaaSHub helps you find the best software and product alternatives

www.saashub.com

featured

llama

184 53,053 8.1 Python

Inference code for Llama models

They did not. Some random person is asking Meta to change it.

stanford_alpaca

108 28,816 2.0 Python

Code and documentation to train Stanford's Alpaca models, and generate the data.

Blog post: https://crfm.stanford.edu/2023/03/13/alpaca.html Demo: https://crfm.stanford.edu/alpaca/ Code: https://github.com/tatsu-lab/stanford_alpaca

InfluxDB

www.influxdata.com featured

Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
trlx

5 4,324 7.9 Python

A repo for distributed training of language models with Reinforcement Learning via Human Feedback (RLHF)

If you checkout the trlx repo they have some examples and they have an example of how they trained sft and ppo on the hh dataset. So it’s basically that but with llama. https://github.com/CarperAI/trlx/blob/main/examples/hh/sft_hh.py

trl

13 8,120 9.7 Python

Train transformer language models with reinforcement learning.

Just the hh directly. From the results it seems like it might possibly be enough but I might also try instruction tuning then running the whole process from that base. I will also be running the reinforcement learning by using a Lora using this as an example https://github.com/lvwerra/trl/tree/main/examples/sentiment/scripts/gpt-neox-20b_peft

text-generation-webui

876 36,293 9.9 Python

A Gradio web UI for Large Language Models. Supports transformers, GPTQ, AWQ, EXL2, llama.cpp (GGUF), Llama models.

#1: The new streaming algorithm has been merged. It's a lot faster! | 6 comments #2: Text streaming will become 1000000x faster tomorrow #3: LLaMA tutorial (including 4-bit mode) | 10 comments

alpaca-lora

107 18,197 3.6 Jupyter Notebook

Instruct-tune LLaMA on consumer hardware

Found this: https://github.com/tloen/alpaca-lora

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Show HN: An end-to-end reinforcement learning library for infinite horizon tasks

1 project | news.ycombinator.com | 29 Dec 2023
Problem with Truncated Quantile Critics (TQC) and n-step learning algorithm.

4 projects | /r/reinforcementlearning | 9 Dec 2023
[P] PettingZoo 1.24.0 has been released (including Stable-Baselines3 tutorials)

4 projects | /r/reinforcementlearning | 24 Aug 2023
SB3 - NotImplementedError: Box([-1. -1. -8.], [1. 1. 8.], (3,), <class 'numpy.float32'>) observation space is not supported

2 projects | /r/reinforcementlearning | 19 Jun 2023
Working with DQN ! need some help !

1 project | /r/deeplearning | 17 May 2023

[R] Stanford-Alpaca 7B model (an instruction tuned version of LLaMA) performs as well as text-davinci-003

This page summarizes the projects mentioned and recommended in the original post on /r/MachineLearning
Machine Learning Pytorch reinforcement-learning
Post date: 13 Mar 2023

llama

stanford_alpaca

InfluxDB

trlx

trl

text-generation-webui

alpaca-lora

Related posts

Show HN: An end-to-end reinforcement learning library for infinite horizon tasks

Problem with Truncated Quantile Critics (TQC) and n-step learning algorithm.

[P] PettingZoo 1.24.0 has been released (including Stable-Baselines3 tutorials)

SB3 - NotImplementedError: Box([-1. -1. -8.], [1. 1. 8.], (3,), <class 'numpy.float32'>) observation space is not supported

Working with DQN ! need some help !

[R] Stanford-Alpaca 7B model (an instruction tuned version of LLaMA) performs as well as text-davinci-003

This page summarizes the projects mentioned and recommended in the original post on /r/MachineLearning Machine Learning Pytorch reinforcement-learning Post date: 13 Mar 2023

llama

stanford_alpaca

InfluxDB

trlx

trl

text-generation-webui

alpaca-lora

Related posts

Show HN: An end-to-end reinforcement learning library for infinite horizon tasks

Problem with Truncated Quantile Critics (TQC) and n-step learning algorithm.

[P] PettingZoo 1.24.0 has been released (including Stable-Baselines3 tutorials)

SB3 - NotImplementedError: Box([-1. -1. -8.], [1. 1. 8.], (3,), &lt;class 'numpy.float32'&gt;) observation space is not supported

Working with DQN ! need some help !

This page summarizes the projects mentioned and recommended in the original post on /r/MachineLearning
Machine Learning Pytorch reinforcement-learning
Post date: 13 Mar 2023

SB3 - NotImplementedError: Box([-1. -1. -8.], [1. 1. 8.], (3,), <class 'numpy.float32'>) observation space is not supported