Python Natural Language Processing

Open-source Python projects categorized as Natural Language Processing

Top 23 Python Natural Language Processing Projects

  • transformers

    🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.

    Project mention: BetterTransformer: PyTorch-native free-lunch speedups for Transformer-based models | reddit.com/r/MachineLearning | 2022-11-22

    In order to support BetterTransformer with the canonical Transformer models from Transformers library, an integration was done with the open-source library Optimum as a one-liner:

  • funNLP

    中英文敏感词、语言检测、中外手机/电话归属地/运营商查询、名字推断性别、手机号抽取、身份证抽取、邮箱抽取、中日文人名库、中文缩写库、拆字词典、词汇情感值、停用词、反动词表、暴恐词表、繁简体转换、英文模拟中文发音、汪峰歌词生成器、职业名称词库、同义词库、反义词库、否定词库、汽车品牌词库、汽车零件词库、连续英文切割、各种中文词向量、公司名字大全、古诗词库、IT词库、财经词库、成语词库、地名词库、历史名人词库、诗词词库、医学词库、饮食词库、法律词库、汽车词库、动物词库、中文聊天语料、中文谣言数据、百度中文问答数据集、句子相似度匹配算法集合、bert资源、文本生成&摘要相关工具、cocoNLP信息抽取工具、国内电话号码正则匹配、清华大学XLORE:中英文跨语言百科知识图谱、清华大学人工智能技术系列报告、自然语言生成、NLU太难了系列、自动对联数据及机器人、用户名黑名单列表、罪名法务名词及分类模型、微信公众号语料、cs224n深度学习自然语言处理课程、中文手写汉字识别、中文自然语言处理 语料/数据集、变量命名神器、分词语料库+代码、任务型对话英文数据集、ASR 语音数据集 + 基于深度学习的中文

  • InfluxDB

    Build time-series-based applications quickly and at scale.. InfluxDB is the Time Series Data Platform where developers build real-time applications for analytics, IoT and cloud-native services in less time with less code.

  • bert

    TensorFlow code and pre-trained models for BERT

    Project mention: [R] LiBai: a large-scale open-source model training toolbox | reddit.com/r/MachineLearning | 2022-11-09

    Found relevant code at https://github.com/google-research/bert + all code implementations here

  • Jieba

    结巴中文分词

    Project mention: Sentence parser for Mandarin? | reddit.com/r/ChineseLanguage | 2022-09-14

    Jieba: Chinese text segmenter

  • HanLP

    中文分词 词性标注 命名实体识别 依存句法分析 成分句法分析 语义依存分析 语义角色标注 指代消解 风格转换 语义相似度 新词发现 关键词短语提取 自动摘要 文本分类聚类 拼音简繁转换 自然语言处理

    Project mention: Hanlp - Natural language processing for the next decade | reddit.com/r/github_trends | 2022-05-28
  • spaCy

    💫 Industrial-strength Natural Language Processing (NLP) in Python

    Project mention: Has anyone here ever used the seaNMF model for short text topic modeling, and be willing to help me get started with it? | reddit.com/r/LanguageTechnology | 2022-11-24

    Tokenize with NLTK, SpaCy or CoreNLP

  • NLP-progress

    Repository to track the progress in Natural Language Processing (NLP), including the datasets and the current state-of-the-art for the most common NLP tasks.

    Project mention: NLP research status | reddit.com/r/datascience | 2022-10-15
  • Sonar

    Write Clean Python Code. Always.. Sonar helps you commit clean code every time. With over 225 unique rules to find Python bugs, code smells & vulnerabilities, Sonar finds the issues while you focus on the work.

  • d2l-en

    Interactive deep learning book with multi-framework code, math, and discussions. Adopted at 400 universities from 60 countries including Stanford, MIT, Harvard, and Cambridge.

    Project mention: How to pre-train BERT on different objective tasks using HuggingFace | reddit.com/r/deeplearning | 2022-04-10

    There might is bert library for pre-train bert model in huggingface, But I suggestion that you train bert model in native pytorch to understand detail, Limu's course is recommended for you

  • rasa

    💬 Open source machine learning framework to automate text- and voice-based conversations: NLU, dialogue management, connect to Slack, Facebook, and more - Create chatbots and voice assistants

    Project mention: Show HN: Flex – transpile natural language to a programming language | news.ycombinator.com | 2022-11-10

    At the moment it can recognise the type of statements in the training data set [1] and transpile them to Python, Java or C++ using the mappings defined here [2].

    This is very different from how Codex/Autopilot work as it is trained using an NLU framework [3] which is usually used for training chatbots.

    [1]: https://github.com/Flex-lang/transpiler/tree/master/transpil...

    [2]: https://github.com/Flex-lang/transpiler/tree/master/transpil...

    [3]: https://github.com/RasaHQ/rasa

  • datasets

    🤗 The largest hub of ready-to-use datasets for ML models with fast, easy-to-use and efficient data manipulation tools

    Project mention: FauxPilot – an open-source GitHub Copilot server | news.ycombinator.com | 2022-08-02

    And then pass that my_code.json as the dataset name.

    [1] https://github.com/huggingface/datasets

  • gensim

    Topic Modelling for Humans

    Project mention: Topic modeling --- allow multiple topics per statement | reddit.com/r/LanguageTechnology | 2022-11-22

    Try LDA as implemented in gemsin https://github.com/RaRe-Technologies/gensim

  • flair

    A very simple framework for state-of-the-art Natural Language Processing (NLP)

    Project mention: Flair: A simple framework for state-of-the-art Natural Language Processing | news.ycombinator.com | 2022-04-11
  • allennlp

    An open-source NLP research library, built on PyTorch.

    Project mention: How to solve ConfigurationError using HuggingFace Token Classifier | reddit.com/r/learnpython | 2022-10-08

    No clue. So what I did was google the error. Here's what I found: https://github.com/allenai/allennlp/issues/4319

  • NLTK

    NLTK Source

    Project mention: ModuleNotFoundError: No module named 'svgling' | reddit.com/r/learnpython | 2022-11-27

    When I then try to import svgling, I get the error ModuleNotFoundError: No module named 'svgling'. Purpose was to follow this very simple example on NLTK (https://www.nltk.org/).

  • Ciphey

    ⚡ Automatically decrypt encryptions without knowing the key or cipher, decode encodings, and crack hashes ⚡

    Project mention: In CTFs, you'll often get a string of text to decode. Is there a good way to recognize how to decode it? | reddit.com/r/HowToHack | 2022-10-23

    It can help you detect various encryption and encodings and even decrypt them. Ciphey

  • clip-as-service

    🏄 Embed/reason/rank images and sentences with CLIP models

    Project mention: Image Similarity Score using transfer learning | reddit.com/r/MLQuestions | 2022-08-31
  • ludwig

    Data-centric declarative deep learning framework

  • TextBlob

    Simple, Pythonic, text processing--Sentiment analysis, part-of-speech tagging, noun phrase extraction, translation, and more.

    Project mention: Which NLP Library should I use? | reddit.com/r/LanguageTechnology | 2022-07-12

    I think they reference the "other TextBlob" on there website and say that they use it. In case it is this what you mean: https://github.com/sloria/TextBlob

  • Pattern

    Web mining module for Python, with tools for scraping, natural language processing, machine learning, network analysis and visualization.

  • machine_learning_examples

    A collection of machine learning examples and tutorials.

  • attention-is-all-you-need-pytorch

    A PyTorch implementation of the Transformer model in "Attention is All You Need".

  • doccano

    Open source annotation tool for machine learning practitioners.

    Project mention: Text Corpus Tagging System | reddit.com/r/django | 2022-10-18

    doccano is built on a Django backend though I believe it uses extensive front end code for the annotation UI.

  • Stanza

    Official Stanford NLP Python Library for Many Human Languages

    Project mention: Off the shelf sentence parsers? | reddit.com/r/LanguageTechnology | 2022-08-26

    stanza has a constituency parser. There's a model compatible with the dev branch with an accuracy of 95.8 on PTB, using Roberta as a bottom layer, so it's pretty decent at this point. (The currently released model is not as accurate, but it's easy to get the better model to you.) There's also Tregex as a Java addon which can very easily search for a noun phrase highest up in the tree: NP !>> NP will search for a noun phrase which is not dominated by any higher up noun phrase.

  • Zigi

    Close all those tabs. Zigi will handle your updates.. Zigi monitors Jira and GitHub updates, pings you when PRs need approval and lets you take fast actions - all directly from Slack! Plus it reduces cycle time by up to 75%.

NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020). The latest post mention was on 2022-11-27.

Python Natural Language Processing related posts

Index

What are some of the best open-source Natural Language Processing projects in Python? This list will help you:

Project Stars
1 transformers 74,719
2 funNLP 45,214
3 bert 32,630
4 Jieba 29,620
5 HanLP 27,548
6 spaCy 24,644
7 NLP-progress 21,094
8 d2l-en 15,632
9 rasa 15,117
10 datasets 14,783
11 gensim 13,745
12 flair 12,235
13 allennlp 11,300
14 NLTK 11,263
15 Ciphey 11,003
16 clip-as-service 10,989
17 ludwig 8,637
18 TextBlob 8,370
19 Pattern 8,352
20 machine_learning_examples 7,087
21 attention-is-all-you-need-pytorch 7,005
22 doccano 6,979
23 Stanza 6,405
Truly a developer’s best friend
Scout APM is great for developers who want to find and fix performance issues in their applications. With Scout, we'll take care of the bugs so you can focus on building great things 🚀.
scoutapm.com