NLP

Open-source projects categorized as NLP | Edit details
Language filter: + Python + Java + JavaScript + Rust

Top 23 NLP Open-Source Projects

  • GitHub repo transformers

    🤗 Transformers: State-of-the-art Natural Language Processing for Pytorch, TensorFlow, and JAX.

    Project mention: [D] How do pretrained tokenizers work? | reddit.com/r/MachineLearning | 2021-11-26

    I have been using the pretrained tokenizers available from the huggingface/transformers library. And they have been working well for my use case.

  • GitHub repo bert

    TensorFlow code and pre-trained models for BERT

    Project mention: Is huggingface pre-trained models on their site can be used for commercial use? | reddit.com/r/LanguageTechnology | 2021-09-20

    But I can see that the LICENSE over on Google's bert repository comes with an Apache License, meaning we can generally use it - and amusingly if you scroll down to the 'date' and 'author', it says:

  • Scout APM

    Scout APM: A developer's best friend. Try free for 14-days. Scout APM uses tracing logic that ties bottlenecks to source code so you know the exact line of code causing performance issues and can get back to building a great product faster.

  • GitHub repo spaCy

    💫 Industrial-strength Natural Language Processing (NLP) in Python

    Project mention: We created a step-by-step guide about how spaCy v3's configuration and project systems can help you enhance your Natural Language Processing workflows! | reddit.com/r/Python | 2021-11-17

    spaCy on GitHub https://github.com/explosion/spaCy

  • GitHub repo rasa

    💬 Open source machine learning framework to automate text- and voice-based conversations: NLU, dialogue management, connect to Slack, Facebook, and more - Create chatbots and voice assistants

    Project mention: How to Create the Perfect README for Your Open Source Project | dev.to | 2021-11-02

    This example is sourced from RasaHQ

  • GitHub repo gensim

    Topic Modelling for Humans

    Project mention: Gensim – a Python library for topic modelling, document indexing | news.ycombinator.com | 2021-11-25
  • GitHub repo Awesome-pytorch-list

    A comprehensive list of pytorch related content on github,such as different models,implementations,helper libraries,tutorials etc.

    Project mention: Similar open source long library list to TF like Pytorch "ECOSYSTEM TOOLS" | reddit.com/r/tensorflow | 2021-11-19

    I got the following as recombination from elsewhere - https://github.com/jtoy/awesome-tensorflow and there is one for pt as well https://github.com/bharathgs/Awesome-pytorch-list . Thx for the help :D

  • GitHub repo jina

    Cloud-native neural search framework for 𝙖𝙣𝙮 kind of data

    Project mention: Open source tools to track github repository stats? | reddit.com/r/opensource | 2021-10-24

    I use this tool everyday to track growth for Jina (an open-source neural search framework)

  • Nanos

    Run Linux Software Faster and Safer than Linux with Unikernels.

  • GitHub repo datasets

    🤗 The largest hub of ready-to-use datasets for ML models with fast, easy-to-use and efficient data manipulation tools

    Project mention: Hugging Face Introduces ‘Datasets’: A Lightweight Community Library For Natural Language Processing (NLP) | reddit.com/r/artificial | 2021-11-08

    Code for https://arxiv.org/abs/2109.02846 found: https://github.com/huggingface/datasets

  • GitHub repo flair

    A very simple framework for state-of-the-art Natural Language Processing (NLP)

    Project mention: How to create a dataset for training NER models when you only have entity data | reddit.com/r/LanguageTechnology | 2021-10-18

    We have a list of entities in text files separated with a new line. We intend to train the flair model to detect these entities in text, but NER models require the entity to be labeled in a paragraph with BOI format.

  • GitHub repo allennlp

    An open-source NLP research library, built on PyTorch.

    Project mention: Cedille, the largest French language model, open source with a freely accessible playground | reddit.com/r/GPT3 | 2021-11-12
  • GitHub repo NLTK

    NLTK Source

    Project mention: Count words within strings | reddit.com/r/tableau | 2021-10-27
  • GitHub repo nlp_compromise

    modest natural-language processing

    Project mention: Reasons Why JavaScript is Awesome | dev.to | 2021-10-04

    Named Entity Extraction identifies entities like names, locations, or phone numbers inside a given text. Compromise is a JavaScript package that we can use that allows us to not only extract entities in a text but also identify what types of entities they are. Here is a sample program that allows you to enter a text file into the input field, and it would extract and identify any recognizable entities in that text.

  • GitHub repo bert-as-service

    Mapping a variable-length sentence to a fixed-length vector using BERT model

    Project mention: Needed 100% to pass a safety quiz, need to wait a week to retake | reddit.com/r/mildlyinfuriating | 2021-01-12

    You joke but

  • GitHub repo CoreNLP

    Stanford CoreNLP: A Java suite of core NLP tools.

    Project mention: A comparison of libraries for named entity recognition | dev.to | 2021-09-27

    If you need NER, there’s no need to implement it yourself. There are several popular libraries that can do this for you nowadays. Five of these libraries, Stanford CoreNLP, NLTK, OpenNLP, SpaCy, and GATE, were already mentioned in the title.

  • GitHub repo TextBlob

    Simple, Pythonic, text processing--Sentiment analysis, part-of-speech tagging, noun phrase extraction, translation, and more.

    Project mention: Any way for Python to interpret words? | reddit.com/r/learnpython | 2021-07-27

    Check out TextBlob — Spelling correction, splitting text into words and sentences, sentiment analysis, part-of-speech tagging, noun phrase extraction, translation, and more

  • GitHub repo PaddleHub

    Awesome pre-trained models toolkit based on PaddlePaddle.(300+ models including Image, Text, Audio and Video with Easy Inference & Serving deployment)

    Project mention: [P] PaddleHub: An awesome and easy-to-use pre-trained models toolkit | reddit.com/r/MachineLearning | 2021-06-10

    code:https://github.com/PaddlePaddle/PaddleHub

  • GitHub repo attention-is-all-you-need-pytorch

    A PyTorch implementation of the Transformer model in "Attention is All You Need".

    Project mention: Lack of activation in transformer feedforward layer? | reddit.com/r/learnmachinelearning | 2021-05-20

    I'm curious as to why the second matrix multiplication is not followed by an activation unlike the first one. Is there any particular reason why a non-linearity would be trivial or even avoided in the second operation? For reference, variations of this can be witnessed in a number of different implementations, including BERT-pytorch and attention-is-all-you-need-pytorch.

  • GitHub repo best-of-ml-python

    🏆 A ranked list of awesome machine learning Python libraries. Updated weekly.

    Project mention: Awesome list of ML | reddit.com/r/programming | 2021-09-16
  • GitHub repo Stanza

    Official Stanford NLP Python Library for Many Human Languages

    Project mention: Spacy vs NLTK for Spanish Language Statistical Tasks | reddit.com/r/LanguageTechnology | 2021-11-12
  • GitHub repo nlp-recipes

    Natural Language Processing Best Practices & Examples

    Project mention: Is there any utility software/bot that produces descriptor tags for a Reddit image post using the comments? | reddit.com/r/redditdev | 2021-11-07

    I found this (https://github.com/microsoft/nlp-recipes) resource and it has a list of pre-built or easily customizable NLP models that I'm going to try out.

  • GitHub repo mycroft-core

    Mycroft Core, the Mycroft Artificial Intelligence platform.

    Project mention: Privacy-friendly voice assistant? | reddit.com/r/homeautomation | 2021-11-18
  • GitHub repo Smile

    Statistical Machine Intelligence & Learning Engine

    Project mention: Python VS Scala | reddit.com/r/scala | 2021-07-02

    Actually, it does. Scala has Spark for data science and some ML libs like Smile.

  • GitHub repo tokenizers

    💥 Fast State-of-the-Art Tokenizers optimized for Research and Production

    Project mention: Portability of Rust in 2021 | reddit.com/r/rust | 2021-09-10

    In sum I would like the idea to go with Rust as I more or less got to rewrite the whole thing anyway, but I am a bit skeptical if I will be able to interface with everything that might come up at some point. Or probably end up in a wrapper hell if I got to use more C++ libraries. On the other hand there are definitely a few Rust projects out there that might come in handy (for example https://github.com/huggingface/tokenizers). And the build process is pretty awful right now (CMake it is but with lots of hacks).

NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020). The latest post mention was on 2021-11-26.

NLP related posts

Index

What are some of the best open-source NLP projects? This list will help you:

Project Stars
1 transformers 54,578
2 bert 29,731
3 spaCy 21,785
4 rasa 13,063
5 gensim 12,670
6 Awesome-pytorch-list 12,426
7 jina 12,306
8 datasets 11,380
9 flair 10,994
10 allennlp 10,639
11 NLTK 10,260
12 nlp_compromise 10,066
13 bert-as-service 9,728
14 CoreNLP 8,223
15 TextBlob 7,951
16 PaddleHub 7,211
17 attention-is-all-you-need-pytorch 5,980
18 best-of-ml-python 5,921
19 Stanza 5,831
20 nlp-recipes 5,754
21 mycroft-core 5,449
22 Smile 5,395
23 tokenizers 5,026
Find remote jobs at our new job board 99remotejobs.com. There are 34 new remote jobs listed recently.
Are you hiring? Post a new remote job listing for free.
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com