Top 23 Python text-generation Projects

MOSS

4 11,825 8.5 Python

An open-source tool-augmented conversational language model from Fudan University
GPT2-Chinese

2 7,360 2.8 Python

Chinese version of GPT2 training code, using BERT tokenizer.
InfluxDB

www.influxdata.com featured

Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
textgenrnn

7 4,943 0.0 Python

Easily train your own text-generating neural network of any size and complexity on any text dataset with a few lines of code.

Project mention: Modern alternative to textgenrnn? | /r/MLQuestions | 2023-06-09

Try this: 1) (Not sure if that's necessary.) Uninstall textgenrnn: pip3 uninstall textgenrnn. 2) Install it using one of this commands: * pip3 install git+git://github.com/minimaxir/textgenrnn.git * pip3 install git+https://github.com/minimaxir/textgenrnn.git (Try the first one, but if it'll raise an error, try the second one.) That's discussion about this "multi_gpu_model not found" error: https://github.com/minimaxir/textgenrnn/issues/222.

gpt-2-simple

13 3,366 0.0 Python

Python package to easily retrain OpenAI's GPT-2 text-generating model on new texts

Project mention: Show HN: WhatsApp-Llama: A clone of yourself from your WhatsApp conversations | news.ycombinator.com | 2023-09-09

Tap the contact's name in WhatsApp (I think it only works on a phone) and at the bottom of that screen there's Export Chat.
For finetuning GPT-2 I think I used this thing on Google Colab. (My friend ran it on his GPU, it should be doable on most modern-ish GPUs.)
https://github.com/minimaxir/gpt-2-simple
I tried doing something with this a few months ago though and it was a bit of a hassle to get running (needed to use a specific python version for some dependencies...), I forget the details sorry!

DialoGPT

7 2,315 0.0 Python

Large-scale pretraining for dialogue
RL4LMs

5 2,094 0.0 Python

A modular RL library to fine-tune language models to human preferences

Project mention: How To Setup a Model With Guardrails? | /r/LocalLLaMA | 2023-05-12

I think of guardrails as another dimension of human preferences: whether you are training a model to answer questions more gooder or avoid saying horrifying stuff, you are teaching the model a preference. So I thinks it's a straightforward RLHF problem but from a different perspective.

GODEL

5 835 3.4 Python

Large-scale pretrained models for goal-directed dialog

Project mention: Microsoft: Large-scale pretrained models for goal-directed dialog | news.ycombinator.com | 2023-06-05

SaaSHub

www.saashub.com featured

SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
SqueezeLLM

5 571 6.9 Python

[ICML 2024] SqueezeLLM: Dense-and-Sparse Quantization

Project mention: Llama33B vs Falcon40B vs MPT30B | /r/LocalLLaMA | 2023-07-05

Using the currently popular gptq the 3bit quantization hurts performance much more than 4bit, but there's also awq (https://github.com/mit-han-lab/llm-awq) and squishllm (https://github.com/SqueezeAILab/SqueezeLLM) which are able to manage 3bit without as much performance drop - I hope to see them used more commonly.

Cornucopia-LLaMA-Fin-Chinese

19 536 4.4 Python

聚宝盆(Cornucopia): 中文金融系列开源可商用大模型，并提供一套高效轻量化的垂直领域LLM训练框架(Pretraining、SFT、RLHF、Quantize等)

Project mention: Cornucopia-LLaMA-Fin-Chinese: NEW Textual - star count:263.0 | /r/algoprojects | 2023-07-31

commit-autosuggestions

1 383 0.0 Python

A tool that AI automatically recommends commit messages.
minimal-text-diffusion

2 263 4.9 Python

A minimal implementation of diffusion models for text generation
modular-diffusion

1 256 8.0 Python

Python library for designing and training your own Diffusion Models with PyTorch.

Project mention: I Built a Modular Python Library for Designing and Training Diffusion Models from Scratch | /r/SideProject | 2023-09-06

Last week, I released a project I've been working on for months: Modular Diffusion. It's a modular Python library for designing and training your own Diffusion Models in just a few lines of code. I also wrote a documentation page. The project has already gotten some great community feedback and I'm hoping you guys like it too!

MAGIC

2 245 0.0 Python

Language Models Can See: Plugging Visual Controls in Text Generation (by yxuansu)
GoLLIE

1 214 9.6 Python

Guideline following Large Language Model for Information Extraction

Project mention: A LLM trained to follow annotation guidelines, for information extraction tasks | news.ycombinator.com | 2023-10-30

KVQuant

1 194 5.9 Python

KVQuant: Towards 10 Million Context Length LLM Inference with KV Cache Quantization

Project mention: 10M Tokens LLM Context | news.ycombinator.com | 2024-02-02

genius

2 175 10.0 Python

💡GENIUS – generating text using sketches! A strong text generation & data augmentation tool.
mutate

1 149 0.0 Python

A library to synthesize text datasets using Large Language Models (LLM)
ctrl-sum

1 145 0.0 Python

Resources for the "CTRLsum: Towards Generic Controllable Text Summarization" paper
pistoBot

2 139 0.0 Python

Create an AI that chats like you
ctc-gen-eval

3 93 1.3 Python

EMNLP 2021 - CTC: A Unified Framework for Evaluating Natural Language Generation
CommonGen-Eval

2 79 7.8 Python

Evaluating LLMs with CommonGen-Lite

Project mention: Evaluating LLMs with CommonGen-Lite | news.ycombinator.com | 2024-01-08

Leaderboard: https://github.com/allenai/CommonGen-Eval?tab=readme-ov-file...

llmx

2 68 8.0 Python

An API for Chat Fine-Tuned Large Language Models (llm)

Project mention: A Defacto Guide on Building Generative AI Apps with the Google PaLM API | /r/learnmachinelearning | 2023-09-12

⚠️ Alternating Message Authors: the api strictly expects alternating authors for chat based messages. In llmx, I implement a simple check for consecutive messages and merge them with a newline character.

namekrea

1 49 0.0 Python

NameKrea is an AI Domain Name Generator which uses GPT-2
SaaSHub

www.saashub.com featured

SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020).

Python text-generation related posts

Show HN: WhatsApp-Llama: A clone of yourself from your WhatsApp conversations

4 projects | news.ycombinator.com | 9 Sep 2023
Modern alternative to textgenrnn?

1 project | /r/MLQuestions | 9 Jun 2023
Is there any nano-gpt/pico-gpt like implementation available for stable-diffusion models?

1 project | /r/deeplearning | 23 Apr 2023
indistinguishable

4 projects | /r/CuratedTumblr | 20 Mar 2023
Just a thought

1 project | /r/replika | 8 Feb 2023
training gpt on your own sources - how does it work? gpt2 v gpt3? and how much does it cost?

2 projects | /r/OpenAI | 31 Jan 2023
I (re)trained an AI using the 36 lessons of Vivec, the entirety of C0DA, the communist manifesto and the top posts of /r/copypasta and asked it the most important/unanswered lore questions. What are the lore implications of these insights?

1 project | /r/TrueSTL | 14 Dec 2022
A note from our sponsor - InfluxDB
www.influxdata.com | 10 May 2024

Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality. Learn more →

Index

What are some of the best open-source text-generation projects in Python? This list will help you:

	Project	Stars
1	MOSS	11,825
2	GPT2-Chinese	7,360
3	textgenrnn	4,943
4	gpt-2-simple	3,366
5	DialoGPT	2,315
6	RL4LMs	2,094
7	GODEL	835
8	SqueezeLLM	571
9	Cornucopia-LLaMA-Fin-Chinese	536
10	commit-autosuggestions	383
11	minimal-text-diffusion	263
12	modular-diffusion	256
13	MAGIC	245
14	GoLLIE	214
15	KVQuant	194
16	genius	175
17	mutate	149
18	ctrl-sum	145
19	pistoBot	139
20	ctc-gen-eval	93
21	CommonGen-Eval	79
22	llmx	68
23	namekrea	49

Python text-generation

Top 23 Python text-generation Projects

Python text-generation related posts

Show HN: WhatsApp-Llama: A clone of yourself from your WhatsApp conversations

Modern alternative to textgenrnn?

Is there any nano-gpt/pico-gpt like implementation available for stable-diffusion models?

indistinguishable

Just a thought

training gpt on your own sources - how does it work? gpt2 v gpt3? and how much does it cost?

I (re)trained an AI using the 36 lessons of Vivec, the entirety of C0DA, the communist manifesto and the top posts of /r/copypasta and asked it the most important/unanswered lore questions. What are the lore implications of these insights?

Index