Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality. Learn more →
Top 19 rlhf Open-Source Projects
-
Open-Assistant
OpenAssistant is a chat-based assistant that understands tasks, can interact with third-party systems, and retrieve information dynamically to do so.
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
-
argilla
Argilla is a collaboration platform for AI engineers and domain experts that require high-quality outputs, full data ownership, and overall efficiency.
-
awesome-RLHF
A curated list of reinforcement learning with human feedback resources (continually updated)
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
-
safe-rlhf
Safe RLHF: Constrained Value Alignment via Safe Reinforcement Learning from Human Feedback
-
alpaca_eval
An automatic evaluator for instruction-following language models. Human-validated, high-quality, cheap, and fast.
-
ImageReward
[NeurIPS 2023] ImageReward: Learning and Evaluating Human Preferences for Text-to-image Generation
-
distilabel
⚗️ distilabel is a framework for synthetic data and AI feedback for AI engineers that require high-quality outputs, full data ownership, and overall efficiency.
-
xtreme1
Xtreme1 is an all-in-one data labeling and annotation platform for multimodal data training and supports 3D LiDAR point cloud, image, and LLM.
-
HALOs
A library with extensible implementations of DPO, KTO, PPO, ORPO, and other human-aware loss functions (HALOs).
-
Cornucopia-LLaMA-Fin-Chinese
聚宝盆(Cornucopia): 中文金融系列开源可商用大模型,并提供一套高效轻量化的垂直领域LLM训练框架(Pretraining、SFT、RLHF、Quantize等)
-
TextRL
Implementation of ChatGPT RLHF (Reinforcement Learning with Human Feedback) on any generation model in huggingface's transformer (blommz-176B/bloom/gpt/bart/T5/MetaICL)
-
cogment-verse
Research platform for Human-in-the-loop learning (HILL) & Multi-Agent Reinforcement Learning (MARL)
-
opening-up-chatgpt.github.io
Tracking instruction-tuned LLM openness. Paper: Liesenfeld, Andreas, Alianda Lopez, and Mark Dingemanse. 2023. “Opening up ChatGPT: Tracking Openness, Transparency, and Accountability in Instruction-Tuned Text Generators.” In Proceedings of the 5th International Conference on Conversational User Interfaces. doi:10.1145/3571884.3604316.
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
For open assistant, the code: https://github.com/LAION-AI/Open-Assistant/tree/main/inference
Here’s another one - it’s older but has some interesting charts and graphs.
https://arxiv.org/abs/2303.18223
Project mention: Open-Source Data Collection Platform for LLM Fine-Tuning and RLHF | news.ycombinator.com | 2023-06-05I'm Dani, CEO and co-founder of Argilla.
Happy to answer any questions you might have and excited to hear your thoughts!
More about Argilla
GitHub: https://github.com/argilla-io/argilla
Project mention: [R] Meet Beaver-7B: a Constrained Value-Aligned LLM via Safe RLHF Technique | /r/MachineLearning | 2023-05-16
Alpaca Eval is open source and was developed by the same team who trained the alpaca model afaik. It is not like what you said in the other comment
Project mention: Results of finetuning Avalon TRUvision v2 with image scoring | /r/StableDiffusion | 2023-05-17I used Image Reward repo to score generated imaged during training and modified loss function to take score into account.
Project mention: Open-source AI Feedback framework for scalable LLM Alignment | news.ycombinator.com | 2023-11-23
Project mention: Aligning Large Language Models with Human: A Survey | news.ycombinator.com | 2023-09-08A paper by researchers at Huawei - please a very good resource on GitHub of various survey papers (Git link is - https://github.com/GaryYufei/AlignLLMHumanSurvey )
If you are using no-code solutions, increasing an "idea" in a dataset will make that idea more likely to appear.
If you are fine-tuning your own LLM, there are other ways to get your idea to appear. In the literature this is sometimes called RLHF or preference optimization, and here are a few approaches:
Direct Preference Optimization
This uses Elo-scores to learn pairwise preferences. Elo is used in chess and basketball to rank individuals who compete in pairs.
@argilla_io on X.com has been doing some work in evaluating DPO.
Here is a decent thread on this: https://x.com/argilla_io/status/1745057571696693689?s=20
Identity Preference Optimization
IPO is research from Google DeepMind. It removes the reliance of Elo scores to address overfitting issues in DPO.
Paper: https://x.com/kylemarieb/status/1728281581306233036?s=20
Kahneman-Tversky Optimization
KTO is an approach that uses mono preference data. For example, it asks if a response is "good or not." This is helpful for a lot of real word situations (e.g. "Is the restaurant well liked?").
Here is a brief discussion on it:
https://x.com/ralphbrooks/status/1744840033872330938?s=20
Here is more on KTO:
* Paper: https://github.com/ContextualAI/HALOs/blob/main/assets/repor...
* Code: https://github.com/ContextualAI/HALOs
Project mention: Cornucopia-LLaMA-Fin-Chinese: NEW Textual - star count:263.0 | /r/algoprojects | 2023-07-31
rlhf related posts
-
Recipes to align LLMs with AI feedback
-
Tracking Openness of Instruction-Tuned LLMs
-
What on-demand GPU service would you recommend to do fine-tuning of 7B models ?
-
Opening up ChatGPT: tracking “open source” LLM and RLHF architectures
-
Cornucopia-LLaMA-Fin-Chinese: NEW Textual - star count:263.0
-
Llama and ChatGPT Are Not Open-Source
-
Cornucopia-LLaMA-Fin-Chinese: NEW Textual - star count:221.0
-
A note from our sponsor - InfluxDB
www.influxdata.com | 15 May 2024
Index
What are some of the best open-source rlhf projects? This list will help you:
Project | Stars | |
---|---|---|
1 | Open-Assistant | 36,699 |
2 | LLaMA-Factory | 21,791 |
3 | LLMSurvey | 8,967 |
4 | alignment-handbook | 3,886 |
5 | argilla | 3,132 |
6 | awesome-RLHF | 2,775 |
7 | WebGLM | 1,521 |
8 | safe-rlhf | 1,169 |
9 | alpaca_eval | 1,134 |
10 | ImageReward | 952 |
11 | distilabel | 927 |
12 | xtreme1 | 732 |
13 | AlignLLMHumanSurvey | 605 |
14 | HALOs | 561 |
15 | Cornucopia-LLaMA-Fin-Chinese | 547 |
16 | TextRL | 518 |
17 | chain-of-hindsight | 207 |
18 | cogment-verse | 73 |
19 | opening-up-chatgpt.github.io | 66 |
Sponsored