Python NLP

Open-source Python projects categorized as NLP | Edit details

Top 23 Python NLP Projects

  • GitHub repo transformers

    🤗 Transformers: State-of-the-art Natural Language Processing for Pytorch, TensorFlow, and JAX.

    Project mention: [D] How do pretrained tokenizers work? | reddit.com/r/MachineLearning | 2021-11-26

    I have been using the pretrained tokenizers available from the huggingface/transformers library. And they have been working well for my use case.

  • GitHub repo bert

    TensorFlow code and pre-trained models for BERT

    Project mention: Is huggingface pre-trained models on their site can be used for commercial use? | reddit.com/r/LanguageTechnology | 2021-09-20

    But I can see that the LICENSE over on Google's bert repository comes with an Apache License, meaning we can generally use it - and amusingly if you scroll down to the 'date' and 'author', it says:

  • Nanos

    Run Linux Software Faster and Safer than Linux with Unikernels.

  • GitHub repo spaCy

    💫 Industrial-strength Natural Language Processing (NLP) in Python

    Project mention: We created a step-by-step guide about how spaCy v3's configuration and project systems can help you enhance your Natural Language Processing workflows! | reddit.com/r/Python | 2021-11-17

    spaCy on GitHub https://github.com/explosion/spaCy

  • GitHub repo rasa

    💬 Open source machine learning framework to automate text- and voice-based conversations: NLU, dialogue management, connect to Slack, Facebook, and more - Create chatbots and voice assistants

    Project mention: How to Create the Perfect README for Your Open Source Project | dev.to | 2021-11-02

    This example is sourced from RasaHQ

  • GitHub repo gensim

    Topic Modelling for Humans

    Project mention: Gensim – a Python library for topic modelling, document indexing | news.ycombinator.com | 2021-11-25
  • GitHub repo jina

    Cloud-native neural search framework for 𝙖𝙣𝙮 kind of data

    Project mention: Open source tools to track github repository stats? | reddit.com/r/opensource | 2021-10-24

    I use this tool everyday to track growth for Jina (an open-source neural search framework)

  • GitHub repo datasets

    🤗 The largest hub of ready-to-use datasets for ML models with fast, easy-to-use and efficient data manipulation tools

    Project mention: Hugging Face Introduces ‘Datasets’: A Lightweight Community Library For Natural Language Processing (NLP) | reddit.com/r/artificial | 2021-11-08

    Code for https://arxiv.org/abs/2109.02846 found: https://github.com/huggingface/datasets

  • Scout APM

    Scout APM: A developer's best friend. Try free for 14-days. Scout APM uses tracing logic that ties bottlenecks to source code so you know the exact line of code causing performance issues and can get back to building a great product faster.

  • GitHub repo flair

    A very simple framework for state-of-the-art Natural Language Processing (NLP)

    Project mention: How to create a dataset for training NER models when you only have entity data | reddit.com/r/LanguageTechnology | 2021-10-18

    We have a list of entities in text files separated with a new line. We intend to train the flair model to detect these entities in text, but NER models require the entity to be labeled in a paragraph with BOI format.

  • GitHub repo allennlp

    An open-source NLP research library, built on PyTorch.

    Project mention: Cedille, the largest French language model, open source with a freely accessible playground | reddit.com/r/GPT3 | 2021-11-12
  • GitHub repo NLTK

    NLTK Source

    Project mention: Count words within strings | reddit.com/r/tableau | 2021-10-27
  • GitHub repo bert-as-service

    Mapping a variable-length sentence to a fixed-length vector using BERT model

    Project mention: Needed 100% to pass a safety quiz, need to wait a week to retake | reddit.com/r/mildlyinfuriating | 2021-01-12

    You joke but

  • GitHub repo TextBlob

    Simple, Pythonic, text processing--Sentiment analysis, part-of-speech tagging, noun phrase extraction, translation, and more.

    Project mention: Any way for Python to interpret words? | reddit.com/r/learnpython | 2021-07-27

    Check out TextBlob — Spelling correction, splitting text into words and sentences, sentiment analysis, part-of-speech tagging, noun phrase extraction, translation, and more

  • GitHub repo PaddleHub

    Awesome pre-trained models toolkit based on PaddlePaddle.(300+ models including Image, Text, Audio and Video with Easy Inference & Serving deployment)

    Project mention: [P] PaddleHub: An awesome and easy-to-use pre-trained models toolkit | reddit.com/r/MachineLearning | 2021-06-10

    code:https://github.com/PaddlePaddle/PaddleHub

  • GitHub repo attention-is-all-you-need-pytorch

    A PyTorch implementation of the Transformer model in "Attention is All You Need".

    Project mention: Lack of activation in transformer feedforward layer? | reddit.com/r/learnmachinelearning | 2021-05-20

    I'm curious as to why the second matrix multiplication is not followed by an activation unlike the first one. Is there any particular reason why a non-linearity would be trivial or even avoided in the second operation? For reference, variations of this can be witnessed in a number of different implementations, including BERT-pytorch and attention-is-all-you-need-pytorch.

  • GitHub repo best-of-ml-python

    🏆 A ranked list of awesome machine learning Python libraries. Updated weekly.

    Project mention: Awesome list of ML | reddit.com/r/programming | 2021-09-16
  • GitHub repo Stanza

    Official Stanford NLP Python Library for Many Human Languages

    Project mention: Spacy vs NLTK for Spanish Language Statistical Tasks | reddit.com/r/LanguageTechnology | 2021-11-12
  • GitHub repo nlp-recipes

    Natural Language Processing Best Practices & Examples

    Project mention: Is there any utility software/bot that produces descriptor tags for a Reddit image post using the comments? | reddit.com/r/redditdev | 2021-11-07

    I found this (https://github.com/microsoft/nlp-recipes) resource and it has a list of pre-built or easily customizable NLP models that I'm going to try out.

  • GitHub repo mycroft-core

    Mycroft Core, the Mycroft Artificial Intelligence platform.

    Project mention: Privacy-friendly voice assistant? | reddit.com/r/homeautomation | 2021-11-18
  • GitHub repo flashtext

    Extract Keywords from sentence or Replace keywords in sentences.

    Project mention: How can I speed up thousands of re.subs()? | reddit.com/r/learnpython | 2021-11-12

    For the text part not requiring regex, https://github.com/vi3k6i5/flashtext might help

  • GitHub repo BERT-pytorch

    Google AI 2018 BERT pytorch implementation

    Project mention: Lack of activation in transformer feedforward layer? | reddit.com/r/learnmachinelearning | 2021-05-20

    I'm curious as to why the second matrix multiplication is not followed by an activation unlike the first one. Is there any particular reason why a non-linearity would be trivial or even avoided in the second operation? For reference, variations of this can be witnessed in a number of different implementations, including BERT-pytorch and attention-is-all-you-need-pytorch.

  • GitHub repo ERNIE

    Official implementations for various pre-training models of ERNIE-family, covering topics of Language Understanding & Generation, Multimodal Understanding & Generation, and beyond.

    Project mention: [R] Baidu’s Knowledge-Enhanced ERNIE 3.0 Pretraining Framework Delivers SOTA NLP Results, Surpasses Human Performance on the SuperGLUE Benchmark | reddit.com/r/MachineLearning | 2021-07-16
  • GitHub repo GPT2-Chinese

    Chinese version of GPT2 training code, using BERT tokenizer.

    Project mention: 大陆可以逐步要求所有居民和企业每隔一段时间学习习的讲话和新闻评论,并上报思想总结吗? | reddit.com/r/China_irl | 2021-02-15
  • GitHub repo unilm

    Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities

    Project mention: WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech Processing achieves SOTA performance on the SUPERB benchmark | reddit.com/r/speechtech | 2021-11-10

    Trained on 94k hours: 60k hrs Libri-Light + 10k hrs GigaSpeech + 24k hrs VoxPopuli!!!! Code and models https://github.com/microsoft/unilm/tree/master/wavlm

NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020). The latest post mention was on 2021-11-26.

Python NLP related posts

Index

What are some of the best open-source NLP projects in Python? This list will help you:

Project Stars
1 transformers 54,578
2 bert 29,731
3 spaCy 21,827
4 rasa 13,063
5 gensim 12,670
6 jina 12,306
7 datasets 11,380
8 flair 10,994
9 allennlp 10,639
10 NLTK 10,269
11 bert-as-service 9,747
12 TextBlob 7,951
13 PaddleHub 7,211
14 attention-is-all-you-need-pytorch 5,980
15 best-of-ml-python 5,921
16 Stanza 5,831
17 nlp-recipes 5,754
18 mycroft-core 5,456
19 flashtext 4,997
20 BERT-pytorch 4,608
21 ERNIE 4,606
22 GPT2-Chinese 4,522
23 unilm 3,779
Find remote jobs at our new job board 99remotejobs.com. There are 34 new remote jobs listed recently.
Are you hiring? Post a new remote job listing for free.
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com