Top 23 Natural Language Processing Open-Source Projects

transformers

173 124,557 10.0 Python

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.

Project mention: AI enthusiasm #6 - Finetune any LLM you want💡 | dev.to | 2024-04-16

Most of this tutorial is based on Hugging Face course about Transformers and on Niels Rogge's Transformers tutorials: make sure to check their work and give them a star on GitHub, if you please ❤️
funNLP

0 63,684 3.7 Python

中英文敏感词、语言检测、中外手机/电话归属地/运营商查询、名字推断性别、手机号抽取、身份证抽取、邮箱抽取、中日文人名库、中文缩写库、拆字词典、词汇情感值、停用词、反动词表、暴恐词表、繁简体转换、英文模拟中文发音、汪峰歌词生成器、职业名称词库、同义词库、反义词库、否定词库、汽车品牌词库、汽车零件词库、连续英文切割、各种中文词向量、公司名字大全、古诗词库、IT词库、财经词库、成语词库、地名词库、历史名人词库、诗词词库、医学词库、饮食词库、法律词库、汽车词库、动物词库、中文聊天语料、中文谣言数据、百度中文问答数据集、句子相似度匹配算法集合、bert资源、文本生成&摘要相关工具、cocoNLP信息抽取工具、国内电话号码正则匹配、清华大学XLORE:中英文跨语言百科知识图谱、清华大学人工智能技术系列报告、自然语言生成、NLU太难了系列、自动对联数据及机器人、用户名黑名单列表、罪名法务名词及分类模型、微信公众号语料、cs224n深度学习自然语言处理课程、中文手写汉字识别、中文自然语言处理语料/数据集、变量命名神器、分词语料库+代码、任务型对话英文数据集、ASR 语音数据集 + 基于深度学习的中文
InfluxDB

www.influxdata.com
sponsored

Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
bert

49 36,945 0.0 Python

TensorFlow code and pre-trained models for BERT

Project mention: OpenAI – Application for US trademark "GPT" has failed | news.ycombinator.com | 2024-02-15

task-specific parameters, and is trained on the downstream tasks by simply fine-tuning all pre-trained parameters.
[0] https://arxiv.org/abs/1810.04805
Made-With-ML

51 35,610 6.8 Jupyter Notebook

Learn how to design, develop, deploy and iterate on production-grade ML applications.

Project mention: [D] How do you keep up to date on Machine Learning? | /r/learnmachinelearning | 2023-08-13

Made With ML
Jieba

6 32,323 0.0 Python

结巴中文分词

Project mention: [OC] How Many Chinese Characters You Need to Learn to Read Chinese! | /r/dataisbeautiful | 2023-06-14

jieba to do Chinese word segmentation
HanLP

3 32,214 5.6 Python

中文分词词性标注命名实体识别依存句法分析成分句法分析语义依存分析语义角色标注指代消解风格转换语义相似度新词发现关键词短语提取自动摘要文本分类聚类拼音简繁转换自然语言处理
spaCy

106 28,660 9.3 Python

💫 Industrial-strength Natural Language Processing (NLP) in Python

Project mention: Step by step guide to create customized chatbot by using spaCy (Python NLP library) | dev.to | 2024-03-23

Hi Community, In this article, I will demonstrate below steps to create your own chatbot by using spaCy (spaCy is an open-source software library for advanced natural language processing, written in the programming languages Python and Cython):
WorkOS

workos.com
sponsored

The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
applied-ml

13 25,853 4.3

📚 Papers & tech blogs by companies sharing their work on data science & machine learning in production.
NLP-progress

17 22,290 3.2 Python

Repository to track the progress in Natural Language Processing (NLP), including the datasets and the current state-of-the-art for the most common NLP tasks.
d2l-en

6 21,564 8.7 Python

Interactive deep learning book with multi-framework code, math, and discussions. Adopted at 500 universities from 70 countries including Stanford, MIT, Harvard, and Cambridge.
datasets

15 18,345 9.5 Python

🤗 The largest hub of ready-to-use datasets for ML models with fast, easy-to-use and efficient data manipulation tools

Project mention: 🐍🐍 23 issues to grow yourself as an exceptional open-source Python expert 🧑‍💻 🥇 | dev.to | 2023-10-19
rasa

16 17,919 9.6 Python

💬 Open source machine learning framework to automate text- and voice-based conversations: NLU, dialogue management, connect to Slack, Facebook, and more - Create chatbots and voice assistants

Project mention: 🔥🚀 Top 10 Open-Source Must-Have Tools for Crafting Your Own Chatbot 🤖💬 | dev.to | 2023-11-06

Support Rasa on GitHub ⭐
Ciphey

27 16,920 2.9 Python

⚡ Automatically decrypt encryptions without knowing the key or cipher, decode encodings, and crack hashes ⚡

Project mention: CyberChef from GCHQ: The Cyber Swiss Army Knife | news.ycombinator.com | 2024-02-01

I also discovered Ciphey. Neat little tool indeed, but it's being deprecated. It's mentioned in this issue[1] and being replaced with Ares[2]. Neither could decipher this strange encryption[3] I used it on :(
[1] https://github.com/Ciphey/Ciphey/issues/764
[2] https://github.com/bee-san/Ares
[3] "dEFLWWFKQWxRQW16RnkvbTZML0lsdz09" original text is "hacker"
awesome-nlp

3 15,952 4.3

:book: A curated list of resources dedicated to Natural Language Processing (NLP)
gensim

18 15,212 7.5 Python

Topic Modelling for Humans

Project mention: Aggregating news from different sources | /r/learnprogramming | 2023-07-08
Awesome-pytorch-list

2 14,903 0.0

A comprehensive list of pytorch related content on github,such as different models,implementations,helper libraries,tutorials etc.
ML-YouTube-Courses

309 14,286 7.2

📺 Discover the latest machine learning / AI courses on YouTube.

Project mention: A Curated List of Free ML/ DL YouTube Courses | news.ycombinator.com | 2024-01-28
DocsGPT

35 14,124 9.8 Python

GPT-powered chat for documentation, chat with your documents

Project mention: You can earn free shirt by contributing to DocsGPT | /r/hacktoberfest | 2023-10-03
nlp-tutorial

1 13,666 0.0 Jupyter Notebook

Natural Language Processing Tutorial for Deep Learning Researchers
flair

9 13,538 9.4 Python

A very simple framework for state-of-the-art Natural Language Processing (NLP)
NLTK

64 12,999 8.3 Python

NLTK Source

Project mention: Building a local AI smart Home Assistant | news.ycombinator.com | 2024-01-13

alternatively, could we not simply split by common characters such as newlines and periods, to split it within sentences? it would be fragile with special handling required for numbers with decimal points and probably various other edge cases, though.
there are also Python libraries meant for natural language parsing[0] that could do that task for us. I even see examples on stack overflow[1] that simply split text into sentences.
[0]: https://www.nltk.org/
MOSS

4 11,804 8.5 Python

An open-source tool-augmented conversational language model from Fudan University

Project mention: Has anyone tried fine tuning on a dataset of complex tasks that require tool use? | /r/LocalLLaMA | 2023-05-05
deep-learning-drizzle

1 11,738 0.0 HTML

Drench yourself in Deep Learning, Reinforcement Learning, Machine Learning, Computer Vision, and NLP by learning from these exciting lectures!!
SaaSHub

www.saashub.com
sponsored

SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020). The latest post mention was on 2024-04-16.

Natural Language Processing related posts

Show HN: Next-token prediction in JavaScript – build fast LLMs from scratch
10 projects | news.ycombinator.com | 10 Apr 2024
AI enthusiasm #6 - Finetune any LLM you want💡
2 projects | dev.to | 16 Apr 2024
Schedule-Free Learning – A New Way to Train
3 projects | news.ycombinator.com | 6 Apr 2024
Step by step guide to create customized chatbot by using spaCy (Python NLP library)
1 project | dev.to | 23 Mar 2024
Gemma doesn't suck anymore – 8 bug fixes
3 projects | news.ycombinator.com | 11 Mar 2024
Show HN: GPT Fill‐in‐the‐Blanks: A Placeholder PowerPlay for PowerPoint
1 project | news.ycombinator.com | 27 Feb 2024
Ask HN: Grammarly Alternatives?
2 projects | news.ycombinator.com | 27 Feb 2024
A note from our sponsor - SaaSHub
www.saashub.com | 18 Apr 2024

SaaSHub helps you find the best software and product alternatives Learn more →

Index

What are some of the best open-source Natural Language Processing projects? This list will help you:

	Project	Stars
1	transformers	124,557
2	funNLP	63,684
3	bert	36,945
4	Made-With-ML	35,610
5	Jieba	32,323
6	HanLP	32,214
7	spaCy	28,660
8	applied-ml	25,853
9	NLP-progress	22,290
10	d2l-en	21,564
11	datasets	18,345
12	rasa	17,919
13	Ciphey	16,920
14	awesome-nlp	15,952
15	gensim	15,212
16	Awesome-pytorch-list	14,903
17	ML-YouTube-Courses	14,286
18	DocsGPT	14,124
19	nlp-tutorial	13,666
20	flair	13,538
21	NLTK	12,999
22	MOSS	11,804
23	deep-learning-drizzle	11,738