Python Natural Language Processing

Open-source Python projects categorized as Natural Language Processing | Edit details

Top 23 Python Natural Language Processing Projects

  • GitHub repo transformers

    🤗 Transformers: State-of-the-art Natural Language Processing for Pytorch, TensorFlow, and JAX.

    Project mention: Export and run other machine learning models | | 2021-10-14

    txtai primarily has support for Hugging Face Transformers and ONNX models. This enables txtai to hook into the rich model framework available in Python, export this functionality via the API to other languages (JavaScript, Java, Go, Rust) and even export and natively load models with ONNX.

  • GitHub repo funNLP

    中英文敏感词、语言检测、中外手机/电话归属地/运营商查询、名字推断性别、手机号抽取、身份证抽取、邮箱抽取、中日文人名库、中文缩写库、拆字词典、词汇情感值、停用词、反动词表、暴恐词表、繁简体转换、英文模拟中文发音、汪峰歌词生成器、职业名称词库、同义词库、反义词库、否定词库、汽车品牌词库、汽车零件词库、连续英文切割、各种中文词向量、公司名字大全、古诗词库、IT词库、财经词库、成语词库、地名词库、历史名人词库、诗词词库、医学词库、饮食词库、法律词库、汽车词库、动物词库、中文聊天语料、中文谣言数据、百度中文问答数据集、句子相似度匹配算法集合、bert资源、文本生成&摘要相关工具、cocoNLP信息抽取工具、国内电话号码正则匹配、清华大学XLORE:中英文跨语言百科知识图谱、清华大学人工智能技术系列报告、自然语言生成、NLU太难了系列、自动对联数据及机器人、用户名黑名单列表、罪名法务名词及分类模型、微信公众号语料、cs224n深度学习自然语言处理课程、中文手写汉字识别、中文自然语言处理 语料/数据集、变量命名神器、分词语料库+代码、任务型对话英文数据集、ASR 语音数据集 + 基于深度学习的中文


    Optimize your datasets for ML. Goodbye, boilerplate code - the fastest dataset optimization and management tool for computer vision.

  • GitHub repo bert

    TensorFlow code and pre-trained models for BERT

    Project mention: Is huggingface pre-trained models on their site can be used for commercial use? | | 2021-09-20

    But I can see that the LICENSE over on Google's bert repository comes with an Apache License, meaning we can generally use it - and amusingly if you scroll down to the 'date' and 'author', it says:

  • GitHub repo Jieba


    Project mention: Learn vocabulary effortlessly while browsing the web [FR,EN,DE,PT,ES] | | 2021-03-23

    Since you're saying the main issue is segmentation, there are libraries to help out with that issue. jieba is fantastic if you have a Python backend, nodejieba (50k downloads/week) if it's more JS-side.

  • GitHub repo spaCy

    💫 Industrial-strength Natural Language Processing (NLP) in Python

    Project mention: I put together a tutorial and overview on how to use DeepSpeech to do Speech Recognition in Python | | 2021-10-14

    It definitely could - with the real-time speech recognition example shown in the tutorial. But you'd likely need some sort of NLU running after the transcription is performed - to basically parse what was spoken into a command that you can use to run some business logic. There are some good open source libs for this too like

  • GitHub repo NLP-progress

    Repository to track the progress in Natural Language Processing (NLP), including the datasets and the current state-of-the-art for the most common NLP tasks.

    Project mention: [P] NLP "tl;dr" Notes on Transformers | | 2021-08-12

    It would also be cool to have some charts with parameter density and even overall effectiveness (a tl;dr version of SOTA-trackers, maybe?) if that doesn't prove too infeasible.

  • GitHub repo rasa

    💬 Open source machine learning framework to automate text- and voice-based conversations: NLU, dialogue management, connect to Slack, Facebook, and more - Create chatbots and voice assistants

    Project mention: Building a chatbot - How should I approach this? | | 2021-08-12

    Like u/Hungry_Check_9153 says, about your image of chabots working, I recommend looking at rasa which is an open source python chatbot. To give yourself an idea of the sheer scope of such a project, take a look at their github. Building a chatbot using Rasa, may be a good first step and offers plenty of experience writing and learning python code.

  • Nanos

    Run Linux Software Faster and Safer than Linux with Unikernels.

  • GitHub repo gensim

    Topic Modelling for Humans

    Project mention: The unthinking application of this regex-efficiency check wasted our attention | | 2021-09-30
  • GitHub repo d2l-en

    Interactive deep learning book with multi-framework code, math, and discussions. Adopted at 200 universities.

    Project mention: I created a way to learn machine learning through Jupyter | | 2021-04-30

    There are actually some online books and courses built on Jupyter Notebook ([Dive to Deep Learning Book]( for example). However yours is more detail and could really helps beginners.

  • GitHub repo flair

    A very simple framework for state-of-the-art Natural Language Processing (NLP)

    Project mention: Preparing data for training NER models | | 2021-10-11

    Training most of the Named Entity Recognition (NER) models for example Flair usually needs to format data in BOI tagging) scheme as shown below where each sentence is separated by blank line

  • GitHub repo allennlp

    An open-source NLP research library, built on PyTorch.

    Project mention: Any allennlp users in this sub? | | 2021-10-08 looks active

  • GitHub repo NLTK

    NLTK Source

    Project mention: Top 10 Python Libraries for Machine Learning | | 2021-09-09

    Website: Github Repository: Developed By: Team NLTK Primary Purpose: Natural Language Processing

  • GitHub repo datasets

    🤗 The largest hub of ready-to-use datasets for ML models with fast, easy-to-use and efficient data manipulation tools

    Project mention: Datasets: A Community Library for Natural Language Processing | | 2021-09-08
  • GitHub repo bert-as-service

    Mapping a variable-length sentence to a fixed-length vector using BERT model

    Project mention: Needed 100% to pass a safety quiz, need to wait a week to retake | | 2021-01-12

    You joke but

  • GitHub repo Ciphey

    ⚡ Automatically decrypt encryptions without knowing the key or cipher, decode encodings, and crack hashes ⚡

    Project mention: Awesome Penetration Testing | | 2021-10-06

    Ciphey - Automated decryption tool using artificial intelligence and natural language processing.

  • GitHub repo Pattern

    Web mining module for Python, with tools for scraping, natural language processing, machine learning, network analysis and visualization.

    Project mention: Discussion Thread | | 2021-08-27

    if you're curious about the nitty gritty, the parsing module's documentation is well written and doesn't require a comp sci or linguistics degree to get the gist of what's happening.

  • GitHub repo TextBlob

    Simple, Pythonic, text processing--Sentiment analysis, part-of-speech tagging, noun phrase extraction, translation, and more.

    Project mention: Any way for Python to interpret words? | | 2021-07-27

    Check out TextBlob — Spelling correction, splitting text into words and sentences, sentiment analysis, part-of-speech tagging, noun phrase extraction, translation, and more

  • GitHub repo machine_learning_examples

    A collection of machine learning examples and tutorials.

    Project mention: How to save an attention model for deployment/exposing to an API? | | 2021-08-17

    I've been following a course teaching how to make an attention model for neural machine translation, This is the file inside the repo. I know that I'll have to use certain functions to make the textual input be processed in encodings and tokens, but those functions use certain instances of the model, which I don't know if I should keep or not. If anyone can please take a look and help me out here, it'd be really really appreciated.

  • GitHub repo pytext

    A natural language modeling framework based on PyTorch

  • GitHub repo attention-is-all-you-need-pytorch

    A PyTorch implementation of the Transformer model in "Attention is All You Need".

    Project mention: Lack of activation in transformer feedforward layer? | | 2021-05-20

    I'm curious as to why the second matrix multiplication is not followed by an activation unlike the first one. Is there any particular reason why a non-linearity would be trivial or even avoided in the second operation? For reference, variations of this can be witnessed in a number of different implementations, including BERT-pytorch and attention-is-all-you-need-pytorch.

  • GitHub repo Stanza

    Official Stanford NLP Python Library for Many Human Languages

  • GitHub repo nlp-recipes

    Natural Language Processing Best Practices & Examples

    Project mention: Building a Aspect based sentiment classification | | 2021-09-06

    There is an NLP recipe from Microsoft on ABSA. Have you seen this?

  • GitHub repo pkuseg-python

    pkuseg多领域中文分词工具; The pkuseg toolkit for multi-domain Chinese word segmentation

NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020). The latest post mention was on 2021-10-14.


What are some of the best open-source Natural Language Processing projects in Python? This list will help you:

Project Stars
1 transformers 52,449
2 funNLP 34,022
3 bert 29,306
4 Jieba 27,131
5 spaCy 21,511
6 NLP-progress 19,212
7 rasa 12,815
8 gensim 12,543
9 d2l-en 11,202
10 flair 10,849
11 allennlp 10,545
12 NLTK 10,163
13 datasets 10,108
14 bert-as-service 9,629
15 Ciphey 8,439
16 Pattern 8,064
17 TextBlob 7,897
18 machine_learning_examples 6,352
19 pytext 6,254
20 attention-is-all-you-need-pytorch 5,801
21 Stanza 5,726
22 nlp-recipes 5,694
23 pkuseg-python 5,618
Find remote jobs at our new job board There are 34 new remote jobs listed recently.
Are you hiring? Post a new remote job listing for free.
Scout APM: A developer's best friend. Try free for 14-days
Scout APM uses tracing logic that ties bottlenecks to source code so you know the exact line of code causing performance issues and can get back to building a great product faster.