Python NLP

Open-source Python projects categorized as NLP

Top 23 Python NLP Projects

  • transformers

    🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.

    Project mention: BetterTransformer: PyTorch-native free-lunch speedups for Transformer-based models | reddit.com/r/MachineLearning | 2022-11-22

    In order to support BetterTransformer with the canonical Transformer models from Transformers library, an integration was done with the open-source library Optimum as a one-liner:

  • bert

    TensorFlow code and pre-trained models for BERT

    Project mention: [R] LiBai: a large-scale open-source model training toolbox | reddit.com/r/MachineLearning | 2022-11-09

    Found relevant code at https://github.com/google-research/bert + all code implementations here

  • Scout APM

    Truly a developer’s best friend. Scout APM is great for developers who want to find and fix performance issues in their applications. With Scout, we'll take care of the bugs so you can focus on building great things 🚀.

  • HanLP

    中文分词 词性标注 命名实体识别 依存句法分析 成分句法分析 语义依存分析 语义角色标注 指代消解 风格转换 语义相似度 新词发现 关键词短语提取 自动摘要 文本分类聚类 拼音简繁转换 自然语言处理

    Project mention: Hanlp - Natural language processing for the next decade | reddit.com/r/github_trends | 2022-05-28
  • spaCy

    💫 Industrial-strength Natural Language Processing (NLP) in Python

    Project mention: Has anyone here ever used the seaNMF model for short text topic modeling, and be willing to help me get started with it? | reddit.com/r/LanguageTechnology | 2022-11-24

    Tokenize with NLTK, SpaCy or CoreNLP

  • jina

    🔮 The most advanced MLOps platform for multimodal AI on the cloud · Neural Search · Creative AI · Cloud Native

    Project mention: Have you used Jina for multi-modal applications? | dev.to | 2022-10-24

    How will you build a multi-modal application? I just noticed the release ofJina which is a MLOps framework that empowers anyone to build cross-modal and multi-modal applications on the cloud. It uplifts a PoC into a production-ready service. Jina handles the infrastructure complexity, making advanced solution engineering and cloud-native technologies accessible to every developer. If you tried before, please let me know how do you find about it? Thanks!

  • rasa

    💬 Open source machine learning framework to automate text- and voice-based conversations: NLU, dialogue management, connect to Slack, Facebook, and more - Create chatbots and voice assistants

    Project mention: Show HN: Flex – transpile natural language to a programming language | news.ycombinator.com | 2022-11-10

    At the moment it can recognise the type of statements in the training data set [1] and transpile them to Python, Java or C++ using the mappings defined here [2].

    This is very different from how Codex/Autopilot work as it is trained using an NLU framework [3] which is usually used for training chatbots.

    [1]: https://github.com/Flex-lang/transpiler/tree/master/transpil...

    [2]: https://github.com/Flex-lang/transpiler/tree/master/transpil...

    [3]: https://github.com/RasaHQ/rasa

  • datasets

    🤗 The largest hub of ready-to-use datasets for ML models with fast, easy-to-use and efficient data manipulation tools

    Project mention: FauxPilot – an open-source GitHub Copilot server | news.ycombinator.com | 2022-08-02

    And then pass that my_code.json as the dataset name.

    [1] https://github.com/huggingface/datasets

  • InfluxDB

    Build time-series-based applications quickly and at scale.. InfluxDB is the Time Series Data Platform where developers build real-time applications for analytics, IoT and cloud-native services in less time with less code.

  • gensim

    Topic Modelling for Humans

    Project mention: Topic modeling --- allow multiple topics per statement | reddit.com/r/LanguageTechnology | 2022-11-22

    Try LDA as implemented in gemsin https://github.com/RaRe-Technologies/gensim

  • flair

    A very simple framework for state-of-the-art Natural Language Processing (NLP)

    Project mention: Flair: A simple framework for state-of-the-art Natural Language Processing | news.ycombinator.com | 2022-04-11
  • best-of-ml-python

    🏆 A ranked list of awesome machine learning Python libraries. Updated weekly.

    Project mention: Best-Of Machine Learning with Python | news.ycombinator.com | 2022-04-28
  • allennlp

    An open-source NLP research library, built on PyTorch.

    Project mention: How to solve ConfigurationError using HuggingFace Token Classifier | reddit.com/r/learnpython | 2022-10-08

    No clue. So what I did was google the error. Here's what I found: https://github.com/allenai/allennlp/issues/4319

  • NLTK

    NLTK Source

    Project mention: Estimation of text complexity | dev.to | 2022-11-24

    NLTK: for token processing

  • clip-as-service

    🏄 Embed/reason/rank images and sentences with CLIP models

    Project mention: Image Similarity Score using transfer learning | reddit.com/r/MLQuestions | 2022-08-31
  • PaddleHub

    Awesome pre-trained models toolkit based on PaddlePaddle. (400+ models including Image, Text, Audio, Video and Cross-Modal with Easy Inference & Serving)

    Project mention: [R] ERNIE-ViLG 2.0: Improving Text-to-Image Diffusion Model with Knowledge-Enhanced Mixture-of-Denoising-Experts + Gradio Demo | reddit.com/r/MachineLearning | 2022-10-29

    Hmm, is the code published? The thing on github just makes requests to a remote server.

  • TextBlob

    Simple, Pythonic, text processing--Sentiment analysis, part-of-speech tagging, noun phrase extraction, translation, and more.

    Project mention: Which NLP Library should I use? | reddit.com/r/LanguageTechnology | 2022-07-12

    I think they reference the "other TextBlob" on there website and say that they use it. In case it is this what you mean: https://github.com/sloria/TextBlob

  • unilm

    Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities

    Project mention: [Tutorial] How to Train LayoutLM on a Custom Dataset for Document Extraction with Hugging Face | reddit.com/r/LanguageTechnology | 2022-11-10

    We got excited for a second by this pr on the repo: https://github.com/microsoft/unilm/commit/a54f2d74f05125d4d8a7bc3406affffd1159bf81 which seemed like they might be changing it to MIT license but they reverted pretty quickly after: https://github.com/microsoft/unilm/commit/8444c6b6c4bfcc1cea60d41b85b29d84c0cbec32

  • attention-is-all-you-need-pytorch

    A PyTorch implementation of the Transformer model in "Attention is All You Need".

  • PaddleNLP

    👑 Easy-to-use and powerful NLP library with 🤗 Awesome model zoo, supporting wide-range of NLP tasks from research to industrial applications, including 🗂Text Classification, 🔍 Neural Search, ❓ Question Answering, ℹ️ Information Extraction, 📄 Document Intelligence, 💌 Sentiment Analysis and 🖼 Diffusion AIGC system etc.

    Project mention: The 10 Trending Python Repositories on GitHub (May 2022) | dev.to | 2022-06-23

    PaddleNLP

  • Stanza

    Official Stanford NLP Python Library for Many Human Languages

    Project mention: Off the shelf sentence parsers? | reddit.com/r/LanguageTechnology | 2022-08-26

    stanza has a constituency parser. There's a model compatible with the dev branch with an accuracy of 95.8 on PTB, using Roberta as a bottom layer, so it's pretty decent at this point. (The currently released model is not as accurate, but it's easy to get the better model to you.) There's also Tregex as a Java addon which can very easily search for a noun phrase highest up in the tree: NP !>> NP will search for a noun phrase which is not dominated by any higher up noun phrase.

  • haystack

    :mag: Haystack is an open source NLP framework that leverages pre-trained Transformer models. It enables developers to quickly implement production-ready semantic search, question answering, summarization and document ranking for a wide range of NLP applications.

    Project mention: New free tool that uses fine-tuned BERT model to surface answers from research papers | reddit.com/r/LanguageTechnology | 2022-10-28

    Some cool tools like HayStack that would be useful in putting some of these together.

  • mycroft-core

    Mycroft Core, the Mycroft Artificial Intelligence platform.

    Project mention: (Rhetorical question) How the hell do NTs do it all? | reddit.com/r/AutisticWithADHD | 2022-11-23

    The project is called Mycroft, and while they have their own smart speakers for sale, they also provide software so that you can make your own.

  • GPT2-Chinese

    Chinese version of GPT2 training code, using BERT tokenizer.

  • ERNIE

    Official implementations for various pre-training models of ERNIE-family, covering topics of Language Understanding & Generation, Multimodal Understanding & Generation, and beyond.

    Project mention: ERNIE - ViLG 2.0 by Baidu | reddit.com/r/singularity | 2022-10-31
  • Zigi

    The context switching struggle is real. Zigi makes context switching a thing of the past. It monitors Jira and GitHub updates, pings you when PRs need approval and lets you take fast actions - all directly from Slack!

NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020). The latest post mention was on 2022-11-24.

Python NLP related posts

Index

What are some of the best open-source NLP projects in Python? This list will help you:

Project Stars
1 transformers 74,719
2 bert 32,586
3 HanLP 27,490
4 spaCy 24,644
5 jina 16,690
6 rasa 15,093
7 datasets 14,783
8 gensim 13,718
9 flair 12,235
10 best-of-ml-python 11,952
11 allennlp 11,300
12 NLTK 11,239
13 clip-as-service 10,989
14 PaddleHub 10,665
15 TextBlob 8,363
16 unilm 7,282
17 attention-is-all-you-need-pytorch 7,005
18 PaddleNLP 6,598
19 Stanza 6,391
20 haystack 6,084
21 mycroft-core 6,052
22 GPT2-Chinese 5,411
23 ERNIE 5,319
Write Clean Python Code. Always.
Sonar helps you commit clean code every time. With over 225 unique rules to find Python bugs, code smells & vulnerabilities, Sonar finds the issues while you focus on the work.
www.sonarsource.com