[R] Stanford-Alpaca 7B model (an instruction tuned version of LLaMA) performs as well as text-davinci-003

This page summarizes the projects mentioned and recommended in the original post on /r/MachineLearning

InfluxDB - Power Real-Time Data Analytics at Scale
Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
www.influxdata.com
featured
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com
featured
  • llama

    Inference code for Llama models

  • They did not. Some random person is asking Meta to change it.

  • stanford_alpaca

    Code and documentation to train Stanford's Alpaca models, and generate the data.

  • Blog post: https://crfm.stanford.edu/2023/03/13/alpaca.html Demo: https://crfm.stanford.edu/alpaca/ Code: https://github.com/tatsu-lab/stanford_alpaca

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • trlx

    A repo for distributed training of language models with Reinforcement Learning via Human Feedback (RLHF)

  • If you checkout the trlx repo they have some examples and they have an example of how they trained sft and ppo on the hh dataset. So it’s basically that but with llama. https://github.com/CarperAI/trlx/blob/main/examples/hh/sft_hh.py

  • trl

    Train transformer language models with reinforcement learning.

  • Just the hh directly. From the results it seems like it might possibly be enough but I might also try instruction tuning then running the whole process from that base. I will also be running the reinforcement learning by using a Lora using this as an example https://github.com/lvwerra/trl/tree/main/examples/sentiment/scripts/gpt-neox-20b_peft

  • text-generation-webui

    A Gradio web UI for Large Language Models. Supports transformers, GPTQ, AWQ, EXL2, llama.cpp (GGUF), Llama models.

  • #1: The new streaming algorithm has been merged. It's a lot faster! | 6 comments #2: Text streaming will become 1000000x faster tomorrow #3: LLaMA tutorial (including 4-bit mode) | 10 comments

  • alpaca-lora

    Instruct-tune LLaMA on consumer hardware

  • Found this: https://github.com/tloen/alpaca-lora

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts

  • Show HN: An end-to-end reinforcement learning library for infinite horizon tasks

    1 project | news.ycombinator.com | 29 Dec 2023
  • Problem with Truncated Quantile Critics (TQC) and n-step learning algorithm.

    4 projects | /r/reinforcementlearning | 9 Dec 2023
  • [P] PettingZoo 1.24.0 has been released (including Stable-Baselines3 tutorials)

    4 projects | /r/reinforcementlearning | 24 Aug 2023
  • SB3 - NotImplementedError: Box([-1. -1. -8.], [1. 1. 8.], (3,), <class 'numpy.float32'>) observation space is not supported

    2 projects | /r/reinforcementlearning | 19 Jun 2023
  • Working with DQN ! need some help !

    1 project | /r/deeplearning | 17 May 2023