chain-of-hindsight
alignment-handbook
chain-of-hindsight | alignment-handbook | |
---|---|---|
1 | 3 | |
208 | 4,031 | |
- | 5.5% | |
4.7 | 8.6 | |
8 months ago | 12 days ago | |
Python | Python | |
Apache License 2.0 | Apache License 2.0 |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
chain-of-hindsight
-
🎯 UC Berkeley Researchers Propose a Novel Technique Called Chain of Hindsight (CoH) that can Enable LLMs to Learn from Any Form of Feedback Improving Model Performance
Quick Read: https://www.marktechpost.com/2023/03/04/uc-berkeley-researchers-propose-a-novel-technique-called-chain-of-hindsight-coh-that-can-enable-llms-to-learn-from-any-form-of-feedback-improving-model-performance/ Paper: https://arxiv.org/pdf/2302.02676.pdf Github: https://github.com/lhao499/CoH
alignment-handbook
- Recipes to align LLMs with AI feedback
-
What on-demand GPU service would you recommend to do fine-tuning of 7B models ?
I'd like to run some fine-tuning experiments on 7B models. Specifically, interested to use https://github.com/huggingface/alignment-handbook and run Zephyr-7b recipes on custom datasets. Don't have any viable GPU locally.
- Zephyr 7B β Released
What are some alternatives?
opening-up-chatgpt.github.io - Tracking instruction-tuned LLM openness. Paper: Liesenfeld, Andreas, Alianda Lopez, and Mark Dingemanse. 2023. “Opening up ChatGPT: Tracking Openness, Transparency, and Accountability in Instruction-Tuned Text Generators.” In Proceedings of the 5th International Conference on Conversational User Interfaces. doi:10.1145/3571884.3604316.
WebGLM - WebGLM: An Efficient Web-enhanced Question Answering System (KDD 2023)
safe-rlhf - Safe RLHF: Constrained Value Alignment via Safe Reinforcement Learning from Human Feedback
argilla - Argilla is a collaboration platform for AI engineers and domain experts that require high-quality outputs, full data ownership, and overall efficiency.
Cornucopia-LLaMA-Fin-Chinese - 聚宝盆(Cornucopia): 中文金融系列开源可商用大模型,并提供一套高效轻量化的垂直领域LLM训练框架(Pretraining、SFT、RLHF、Quantize等)
LLMSurvey - The official GitHub page for the survey paper "A Survey of Large Language Models".
peft - 🤗 PEFT: State-of-the-art Parameter-Efficient Fine-Tuning.