Top 23 NLP Open-Source Projects
🤗 Transformers: State-of-the-art Natural Language Processing for Pytorch, TensorFlow, and JAX.Project mention: [D] How do pretrained tokenizers work? | reddit.com/r/MachineLearning | 2021-11-26
I have been using the pretrained tokenizers available from the huggingface/transformers library. And they have been working well for my use case.
TensorFlow code and pre-trained models for BERTProject mention: Is huggingface pre-trained models on their site can be used for commercial use? | reddit.com/r/LanguageTechnology | 2021-09-20
But I can see that the LICENSE over on Google's bert repository comes with an Apache License, meaning we can generally use it - and amusingly if you scroll down to the 'date' and 'author', it says:
Scout APM: A developer's best friend. Try free for 14-days. Scout APM uses tracing logic that ties bottlenecks to source code so you know the exact line of code causing performance issues and can get back to building a great product faster.
💫 Industrial-strength Natural Language Processing (NLP) in PythonProject mention: We created a step-by-step guide about how spaCy v3's configuration and project systems can help you enhance your Natural Language Processing workflows! | reddit.com/r/Python | 2021-11-17
spaCy on GitHub https://github.com/explosion/spaCy
💬 Open source machine learning framework to automate text- and voice-based conversations: NLU, dialogue management, connect to Slack, Facebook, and more - Create chatbots and voice assistantsProject mention: How to Create the Perfect README for Your Open Source Project | dev.to | 2021-11-02
This example is sourced from RasaHQ
Topic Modelling for HumansProject mention: Gensim – a Python library for topic modelling, document indexing | news.ycombinator.com | 2021-11-25
A comprehensive list of pytorch related content on github,such as different models,implementations,helper libraries,tutorials etc.Project mention: Similar open source long library list to TF like Pytorch "ECOSYSTEM TOOLS" | reddit.com/r/tensorflow | 2021-11-19
I got the following as recombination from elsewhere - https://github.com/jtoy/awesome-tensorflow and there is one for pt as well https://github.com/bharathgs/Awesome-pytorch-list . Thx for the help :D
Cloud-native neural search framework for 𝙖𝙣𝙮 kind of dataProject mention: Open source tools to track github repository stats? | reddit.com/r/opensource | 2021-10-24
I use this tool everyday to track growth for Jina (an open-source neural search framework)
Run Linux Software Faster and Safer than Linux with Unikernels.
🤗 The largest hub of ready-to-use datasets for ML models with fast, easy-to-use and efficient data manipulation toolsProject mention: Hugging Face Introduces ‘Datasets’: A Lightweight Community Library For Natural Language Processing (NLP) | reddit.com/r/artificial | 2021-11-08
Code for https://arxiv.org/abs/2109.02846 found: https://github.com/huggingface/datasets
A very simple framework for state-of-the-art Natural Language Processing (NLP)Project mention: How to create a dataset for training NER models when you only have entity data | reddit.com/r/LanguageTechnology | 2021-10-18
We have a list of entities in text files separated with a new line. We intend to train the flair model to detect these entities in text, but NER models require the entity to be labeled in a paragraph with BOI format.
An open-source NLP research library, built on PyTorch.Project mention: Cedille, the largest French language model, open source with a freely accessible playground | reddit.com/r/GPT3 | 2021-11-12
NLTK SourceProject mention: Count words within strings | reddit.com/r/tableau | 2021-10-27
Mapping a variable-length sentence to a fixed-length vector using BERT modelProject mention: Needed 100% to pass a safety quiz, need to wait a week to retake | reddit.com/r/mildlyinfuriating | 2021-01-12
You joke but
Stanford CoreNLP: A Java suite of core NLP tools.Project mention: A comparison of libraries for named entity recognition | dev.to | 2021-09-27
If you need NER, there’s no need to implement it yourself. There are several popular libraries that can do this for you nowadays. Five of these libraries, Stanford CoreNLP, NLTK, OpenNLP, SpaCy, and GATE, were already mentioned in the title.
Simple, Pythonic, text processing--Sentiment analysis, part-of-speech tagging, noun phrase extraction, translation, and more.Project mention: Any way for Python to interpret words? | reddit.com/r/learnpython | 2021-07-27
Check out TextBlob — Spelling correction, splitting text into words and sentences, sentiment analysis, part-of-speech tagging, noun phrase extraction, translation, and more
Awesome pre-trained models toolkit based on PaddlePaddle.(300+ models including Image, Text, Audio and Video with Easy Inference & Serving deployment)Project mention: [P] PaddleHub: An awesome and easy-to-use pre-trained models toolkit | reddit.com/r/MachineLearning | 2021-06-10
A PyTorch implementation of the Transformer model in "Attention is All You Need".Project mention: Lack of activation in transformer feedforward layer? | reddit.com/r/learnmachinelearning | 2021-05-20
I'm curious as to why the second matrix multiplication is not followed by an activation unlike the first one. Is there any particular reason why a non-linearity would be trivial or even avoided in the second operation? For reference, variations of this can be witnessed in a number of different implementations, including BERT-pytorch and attention-is-all-you-need-pytorch.
🏆 A ranked list of awesome machine learning Python libraries. Updated weekly.Project mention: Awesome list of ML | reddit.com/r/programming | 2021-09-16
Official Stanford NLP Python Library for Many Human LanguagesProject mention: Spacy vs NLTK for Spanish Language Statistical Tasks | reddit.com/r/LanguageTechnology | 2021-11-12
Natural Language Processing Best Practices & ExamplesProject mention: Is there any utility software/bot that produces descriptor tags for a Reddit image post using the comments? | reddit.com/r/redditdev | 2021-11-07
I found this (https://github.com/microsoft/nlp-recipes) resource and it has a list of pre-built or easily customizable NLP models that I'm going to try out.
Mycroft Core, the Mycroft Artificial Intelligence platform.Project mention: Privacy-friendly voice assistant? | reddit.com/r/homeautomation | 2021-11-18
Statistical Machine Intelligence & Learning EngineProject mention: Python VS Scala | reddit.com/r/scala | 2021-07-02
Actually, it does. Scala has Spark for data science and some ML libs like Smile.
💥 Fast State-of-the-Art Tokenizers optimized for Research and ProductionProject mention: Portability of Rust in 2021 | reddit.com/r/rust | 2021-09-10
In sum I would like the idea to go with Rust as I more or less got to rewrite the whole thing anyway, but I am a bit skeptical if I will be able to interface with everything that might come up at some point. Or probably end up in a wrapper hell if I got to use more C++ libraries. On the other hand there are definitely a few Rust projects out there that might come in handy (for example https://github.com/huggingface/tokenizers). And the build process is pretty awful right now (CMake it is but with lots of hacks).
NLP related posts
[D] How do pretrained tokenizers work?
2 projects | reddit.com/r/MachineLearning | 26 Nov 2021
Software Engineering Best Practices + Useful Resources🚀
1 project | dev.to | 24 Nov 2021
[D] For those of you working as NLP Engineers in Industry, what should you learn to get up to par?
1 project | reddit.com/r/MachineLearning | 23 Nov 2021
Txtai 3.7 released – streaming and parallel machine learning workflows
1 project | news.ycombinator.com | 22 Nov 2021
txtai 3.7 released - streaming and parallel machine learning workflows
1 project | reddit.com/r/programming | 22 Nov 2021
Label your text data automatically with zeroshot_labels
1 project | news.ycombinator.com | 22 Nov 2021
[P] DataProfiler - Scaleable Sensitive Data Detection & Analysis on Structured & Unstructured Files
1 project | reddit.com/r/MachineLearning | 22 Nov 2021
What are some of the best open-source NLP projects? This list will help you:
Are you hiring? Post a new remote job listing for free.