-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
-
trlx
A repo for distributed training of language models with Reinforcement Learning via Human Feedback (RLHF)
-
text-generation-webui
A Gradio web UI for Large Language Models. Supports transformers, GPTQ, AWQ, EXL2, llama.cpp (GGUF), Llama models.
They did not. Some random person is asking Meta to change it.
Blog post: https://crfm.stanford.edu/2023/03/13/alpaca.html Demo: https://crfm.stanford.edu/alpaca/ Code: https://github.com/tatsu-lab/stanford_alpaca
If you checkout the trlx repo they have some examples and they have an example of how they trained sft and ppo on the hh dataset. So it’s basically that but with llama. https://github.com/CarperAI/trlx/blob/main/examples/hh/sft_hh.py
Just the hh directly. From the results it seems like it might possibly be enough but I might also try instruction tuning then running the whole process from that base. I will also be running the reinforcement learning by using a Lora using this as an example https://github.com/lvwerra/trl/tree/main/examples/sentiment/scripts/gpt-neox-20b_peft
#1: The new streaming algorithm has been merged. It's a lot faster! | 6 comments #2: Text streaming will become 1000000x faster tomorrow #3: LLaMA tutorial (including 4-bit mode) | 10 comments
Found this: https://github.com/tloen/alpaca-lora
Related posts
-
Show HN: An end-to-end reinforcement learning library for infinite horizon tasks
-
Problem with Truncated Quantile Critics (TQC) and n-step learning algorithm.
-
[P] PettingZoo 1.24.0 has been released (including Stable-Baselines3 tutorials)
-
SB3 - NotImplementedError: Box([-1. -1. -8.], [1. 1. 8.], (3,), <class 'numpy.float32'>) observation space is not supported
-
Working with DQN ! need some help !