Finetune_LLMs
synthetic-data-generator
Finetune_LLMs | synthetic-data-generator | |
---|---|---|
2 | 2 | |
438 | 21 | |
- | - | |
8.5 | 3.1 | |
about 1 month ago | 7 days ago | |
Python | Python | |
GNU Affero General Public License v3.0 | MIT License |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
Finetune_LLMs
-
Prepare Dataset
Regarding this: if you have resources (at least Colab Pro), you would be much better off training GPT-J (aka GPT-J-6B). Not only it's 4x larger than the largest GPT-2, its architecture, AFAIK, is based on GPT-3. You can use this repo as a good example for GPT-J finetuning.
-
[D] Fine-tuning GPT-J: lessons learned
And this: https://github.com/mallorbc/Finetune_GPTNEO_GPTJ6B
synthetic-data-generator
- FLaNK AI - 01 April 2024
-
Create synthetic datasets with this OpenAI script
🌐 GitHub Repo: https://github.com/quentinlintz/synthetic-data-generator
What are some alternatives?
DeepSpeed - DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
myGPTReader - A community-driven way to read and chat with AI bots - powered by chatGPT.
mesh-transformer-jax - Model parallel transformers in JAX and Haiku
megabots - 🤖 State-of-the-art, production ready LLM apps made mega-easy, so you don't have to build them from scratch 🤯 Create a bot, now 🫵
code-llama-for-vscode - Use Code Llama with Visual Studio Code and the Continue extension. A local LLM alternative to GitHub Copilot.
gpt4free - The official gpt4free repository | various collection of powerful language models
AnglE - Angle-optimized Text Embeddings | 🔥 SOTA on STS and MTEB Leaderboard
VoiceCraft - Zero-Shot Speech Editing and Text-to-Speech in the Wild
GoLLIE - Guideline following Large Language Model for Information Extraction
langgraph
replicate-llama2-sms-chatbot
speedb - A RocksDB compliant high performance scalable embedded key-value store