Python NLP

Open-source Python projects categorized as NLP | Edit details

Top 23 Python NLP Projects

  • GitHub repo transformers

    🤗 Transformers: State-of-the-art Natural Language Processing for Pytorch, TensorFlow, and JAX.

    Project mention: Export and run other machine learning models | dev.to | 2021-10-14

    txtai primarily has support for Hugging Face Transformers and ONNX models. This enables txtai to hook into the rich model framework available in Python, export this functionality via the API to other languages (JavaScript, Java, Go, Rust) and even export and natively load models with ONNX.

  • GitHub repo bert

    TensorFlow code and pre-trained models for BERT

    Project mention: Is huggingface pre-trained models on their site can be used for commercial use? | reddit.com/r/LanguageTechnology | 2021-09-20

    But I can see that the LICENSE over on Google's bert repository comes with an Apache License, meaning we can generally use it - and amusingly if you scroll down to the 'date' and 'author', it says:

  • Activeloop.ai

    Optimize your datasets for ML. Goodbye, boilerplate code - the fastest dataset optimization and management tool for computer vision.

  • GitHub repo spaCy

    💫 Industrial-strength Natural Language Processing (NLP) in Python

    Project mention: I put together a tutorial and overview on how to use DeepSpeech to do Speech Recognition in Python | reddit.com/r/Python | 2021-10-14

    It definitely could - with the real-time speech recognition example shown in the tutorial. But you'd likely need some sort of NLU running after the transcription is performed - to basically parse what was spoken into a command that you can use to run some business logic. There are some good open source libs for this too like https://spacy.io/

  • GitHub repo rasa

    💬 Open source machine learning framework to automate text- and voice-based conversations: NLU, dialogue management, connect to Slack, Facebook, and more - Create chatbots and voice assistants

    Project mention: Building a chatbot - How should I approach this? | reddit.com/r/learnpython | 2021-08-12

    Like u/Hungry_Check_9153 says, about your image of chabots working, I recommend looking at rasa which is an open source python chatbot. To give yourself an idea of the sheer scope of such a project, take a look at their github. Building a chatbot using Rasa, may be a good first step and offers plenty of experience writing and learning python code.

  • GitHub repo gensim

    Topic Modelling for Humans

    Project mention: The unthinking application of this regex-efficiency check wasted our attention | news.ycombinator.com | 2021-09-30
  • GitHub repo jina

    Cloud-native neural search framework for 𝙖𝙣𝙮 kind of data

    Project mention: DAE think it's really cool that we have open-source software tools to build our own search engines? | reddit.com/r/DoesAnybodyElse | 2021-10-11

    if you'd like to skip the tutorial and dive head-on: https://github.com/jina-ai/jina

  • GitHub repo flair

    A very simple framework for state-of-the-art Natural Language Processing (NLP)

    Project mention: Preparing data for training NER models | reddit.com/r/LanguageTechnology | 2021-10-11

    Training most of the Named Entity Recognition (NER) models for example Flair usually needs to format data in BOI tagging) scheme as shown below where each sentence is separated by blank line

  • Nanos

    Run Linux Software Faster and Safer than Linux with Unikernels.

  • GitHub repo allennlp

    An open-source NLP research library, built on PyTorch.

    Project mention: Any allennlp users in this sub? | reddit.com/r/LanguageTechnology | 2021-10-08

    https://github.com/allenai/allennlp/discussions looks active

  • GitHub repo NLTK

    NLTK Source

    Project mention: Top 10 Python Libraries for Machine Learning | dev.to | 2021-09-09

    Website: https://www.nltk.org/ Github Repository:https://github.com/nltk/nltk Developed By: Team NLTK Primary Purpose: Natural Language Processing

  • GitHub repo datasets

    🤗 The largest hub of ready-to-use datasets for ML models with fast, easy-to-use and efficient data manipulation tools

    Project mention: Datasets: A Community Library for Natural Language Processing | news.ycombinator.com | 2021-09-08
  • GitHub repo bert-as-service

    Mapping a variable-length sentence to a fixed-length vector using BERT model

    Project mention: Needed 100% to pass a safety quiz, need to wait a week to retake | reddit.com/r/mildlyinfuriating | 2021-01-12

    You joke but

  • GitHub repo TextBlob

    Simple, Pythonic, text processing--Sentiment analysis, part-of-speech tagging, noun phrase extraction, translation, and more.

    Project mention: Any way for Python to interpret words? | reddit.com/r/learnpython | 2021-07-27

    Check out TextBlob — Spelling correction, splitting text into words and sentences, sentiment analysis, part-of-speech tagging, noun phrase extraction, translation, and more

  • GitHub repo PaddleHub

    Awesome pre-trained models toolkit based on PaddlePaddle.(300+ models including Image, Text, Audio and Video with Easy Inference & Serving deployment)

    Project mention: [P] PaddleHub: An awesome and easy-to-use pre-trained models toolkit | reddit.com/r/MachineLearning | 2021-06-10

    code:https://github.com/PaddlePaddle/PaddleHub

  • GitHub repo best-of-ml-python

    🏆 A ranked list of awesome machine learning Python libraries. Updated weekly.

    Project mention: Awesome list of ML | reddit.com/r/programming | 2021-09-16
  • GitHub repo attention-is-all-you-need-pytorch

    A PyTorch implementation of the Transformer model in "Attention is All You Need".

    Project mention: Lack of activation in transformer feedforward layer? | reddit.com/r/learnmachinelearning | 2021-05-20

    I'm curious as to why the second matrix multiplication is not followed by an activation unlike the first one. Is there any particular reason why a non-linearity would be trivial or even avoided in the second operation? For reference, variations of this can be witnessed in a number of different implementations, including BERT-pytorch and attention-is-all-you-need-pytorch.

  • GitHub repo Stanza

    Official Stanford NLP Python Library for Many Human Languages

  • GitHub repo nlp-recipes

    Natural Language Processing Best Practices & Examples

    Project mention: Building a Aspect based sentiment classification | reddit.com/r/LanguageTechnology | 2021-09-06

    There is an NLP recipe from Microsoft on ABSA. Have you seen this? https://github.com/microsoft/nlp-recipes/blob/master/examples/sentiment_analysis/absa/absa.ipynb

  • GitHub repo mycroft-core

    Mycroft Core, the Mycroft Artificial Intelligence platform.

    Project mention: Why I’m okay with MOASS not happening this week. | reddit.com/r/GMEJungle | 2021-10-09

    secondly, it might be easier to bluetooth to a phone that they can read, instead. this would provide a few things- one, the phone can handle the speech-to-text (or perhaps more likely, handle going out to something like amazon alexa, siri, or, my favorite, Mycroft.) and they all already have built in displays- and as far as a display on a mask goes, they're heavy, rigid and you'd have to figure out some way breathing around it. (take a look at the rubber filter-masks for things like particulars and such like.)

  • GitHub repo flashtext

    Extract Keywords from sentence or Replace keywords in sentences.

    Project mention: My first NLP pipeline using SpaCy: detect news headlines with company acquisitions | reddit.com/r/Python | 2021-10-08

    Spacy for parsing the Headlines, remove stop words etc. might be ok but I think the problem is quite narrow so a set of fixed regex searches might work quite well. If regex is too slow, try: https://github.com/vi3k6i5/flashtext

  • GitHub repo ERNIE

    Official implementations for various pre-training models of ERNIE-family, covering topics of Language Understanding & Generation, Multimodal Understanding & Generation, and beyond.

    Project mention: [R] Baidu’s Knowledge-Enhanced ERNIE 3.0 Pretraining Framework Delivers SOTA NLP Results, Surpasses Human Performance on the SuperGLUE Benchmark | reddit.com/r/MachineLearning | 2021-07-16
  • GitHub repo BERT-pytorch

    Google AI 2018 BERT pytorch implementation

    Project mention: Lack of activation in transformer feedforward layer? | reddit.com/r/learnmachinelearning | 2021-05-20

    I'm curious as to why the second matrix multiplication is not followed by an activation unlike the first one. Is there any particular reason why a non-linearity would be trivial or even avoided in the second operation? For reference, variations of this can be witnessed in a number of different implementations, including BERT-pytorch and attention-is-all-you-need-pytorch.

  • GitHub repo GPT2-Chinese

    Chinese version of GPT2 training code, using BERT tokenizer.

    Project mention: 大陆可以逐步要求所有居民和企业每隔一段时间学习习的讲话和新闻评论,并上报思想总结吗? | reddit.com/r/China_irl | 2021-02-15
  • GitHub repo unilm

    UniLM AI - Large-scale Self-supervised Pre-training across Tasks, Languages, and Modalities

    Project mention: Microsoft AI Unveils ‘TrOCR’, An End-To-End Transformer-Based OCR Model For Text Recognition With Pre-Trained Models | reddit.com/r/ArtificialInteligence | 2021-10-02

    4 Min Read| Paper | Github

NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020). The latest post mention was on 2021-10-14.

Index

What are some of the best open-source NLP projects in Python? This list will help you:

Project Stars
1 transformers 52,449
2 bert 29,306
3 spaCy 21,511
4 rasa 12,815
5 gensim 12,543
6 jina 11,559
7 flair 10,849
8 allennlp 10,545
9 NLTK 10,163
10 datasets 10,108
11 bert-as-service 9,629
12 TextBlob 7,897
13 PaddleHub 7,030
14 best-of-ml-python 5,815
15 attention-is-all-you-need-pytorch 5,801
16 Stanza 5,726
17 nlp-recipes 5,694
18 mycroft-core 5,393
19 flashtext 4,949
20 ERNIE 4,531
21 BERT-pytorch 4,520
22 GPT2-Chinese 4,358
23 unilm 3,245
Find remote jobs at our new job board 99remotejobs.com. There are 34 new remote jobs listed recently.
Are you hiring? Post a new remote job listing for free.
Scout APM: A developer's best friend. Try free for 14-days
Scout APM uses tracing logic that ties bottlenecks to source code so you know the exact line of code causing performance issues and can get back to building a great product faster.
scoutapm.com