Sonar helps you commit clean code every time. With over 225 unique rules to find Python bugs, code smells & vulnerabilities, Sonar finds the issues while you focus on the work. Learn more →
Top 23 Python NLP Projects
🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.Project mention: BetterTransformer: PyTorch-native free-lunch speedups for Transformer-based models | reddit.com/r/MachineLearning | 2022-11-22
In order to support BetterTransformer with the canonical Transformer models from Transformers library, an integration was done with the open-source library Optimum as a one-liner:
TensorFlow code and pre-trained models for BERTProject mention: [R] LiBai: a large-scale open-source model training toolbox | reddit.com/r/MachineLearning | 2022-11-09
Found relevant code at https://github.com/google-research/bert + all code implementations here
Truly a developer’s best friend. Scout APM is great for developers who want to find and fix performance issues in their applications. With Scout, we'll take care of the bugs so you can focus on building great things 🚀.
中文分词 词性标注 命名实体识别 依存句法分析 成分句法分析 语义依存分析 语义角色标注 指代消解 风格转换 语义相似度 新词发现 关键词短语提取 自动摘要 文本分类聚类 拼音简繁转换 自然语言处理Project mention: Hanlp - Natural language processing for the next decade | reddit.com/r/github_trends | 2022-05-28
💫 Industrial-strength Natural Language Processing (NLP) in PythonProject mention: Has anyone here ever used the seaNMF model for short text topic modeling, and be willing to help me get started with it? | reddit.com/r/LanguageTechnology | 2022-11-24
Tokenize with NLTK, SpaCy or CoreNLP
🔮 The most advanced MLOps platform for multimodal AI on the cloud · Neural Search · Creative AI · Cloud NativeProject mention: Have you used Jina for multi-modal applications? | dev.to | 2022-10-24
How will you build a multi-modal application? I just noticed the release ofJina which is a MLOps framework that empowers anyone to build cross-modal and multi-modal applications on the cloud. It uplifts a PoC into a production-ready service. Jina handles the infrastructure complexity, making advanced solution engineering and cloud-native technologies accessible to every developer. If you tried before, please let me know how do you find about it? Thanks!
💬 Open source machine learning framework to automate text- and voice-based conversations: NLU, dialogue management, connect to Slack, Facebook, and more - Create chatbots and voice assistantsProject mention: Show HN: Flex – transpile natural language to a programming language | news.ycombinator.com | 2022-11-10
At the moment it can recognise the type of statements in the training data set  and transpile them to Python, Java or C++ using the mappings defined here .
This is very different from how Codex/Autopilot work as it is trained using an NLU framework  which is usually used for training chatbots.
🤗 The largest hub of ready-to-use datasets for ML models with fast, easy-to-use and efficient data manipulation tools
Build time-series-based applications quickly and at scale.. InfluxDB is the Time Series Data Platform where developers build real-time applications for analytics, IoT and cloud-native services in less time with less code.
Topic Modelling for HumansProject mention: Topic modeling --- allow multiple topics per statement | reddit.com/r/LanguageTechnology | 2022-11-22
Try LDA as implemented in gemsin https://github.com/RaRe-Technologies/gensim
A very simple framework for state-of-the-art Natural Language Processing (NLP)Project mention: Flair: A simple framework for state-of-the-art Natural Language Processing | news.ycombinator.com | 2022-04-11
🏆 A ranked list of awesome machine learning Python libraries. Updated weekly.Project mention: Best-Of Machine Learning with Python | news.ycombinator.com | 2022-04-28
An open-source NLP research library, built on PyTorch.Project mention: How to solve ConfigurationError using HuggingFace Token Classifier | reddit.com/r/learnpython | 2022-10-08
No clue. So what I did was google the error. Here's what I found: https://github.com/allenai/allennlp/issues/4319
NLTK SourceProject mention: Estimation of text complexity | dev.to | 2022-11-24
NLTK: for token processing
🏄 Embed/reason/rank images and sentences with CLIP modelsProject mention: Image Similarity Score using transfer learning | reddit.com/r/MLQuestions | 2022-08-31
Awesome pre-trained models toolkit based on PaddlePaddle. (400+ models including Image, Text, Audio, Video and Cross-Modal with Easy Inference & Serving)Project mention: [R] ERNIE-ViLG 2.0: Improving Text-to-Image Diffusion Model with Knowledge-Enhanced Mixture-of-Denoising-Experts + Gradio Demo | reddit.com/r/MachineLearning | 2022-10-29
Hmm, is the code published? The thing on github just makes requests to a remote server.
Simple, Pythonic, text processing--Sentiment analysis, part-of-speech tagging, noun phrase extraction, translation, and more.Project mention: Which NLP Library should I use? | reddit.com/r/LanguageTechnology | 2022-07-12
I think they reference the "other TextBlob" on there website and say that they use it. In case it is this what you mean: https://github.com/sloria/TextBlob
Large-scale Self-supervised Pre-training Across Tasks, Languages, and ModalitiesProject mention: [Tutorial] How to Train LayoutLM on a Custom Dataset for Document Extraction with Hugging Face | reddit.com/r/LanguageTechnology | 2022-11-10
We got excited for a second by this pr on the repo: https://github.com/microsoft/unilm/commit/a54f2d74f05125d4d8a7bc3406affffd1159bf81 which seemed like they might be changing it to MIT license but they reverted pretty quickly after: https://github.com/microsoft/unilm/commit/8444c6b6c4bfcc1cea60d41b85b29d84c0cbec32
A PyTorch implementation of the Transformer model in "Attention is All You Need".
👑 Easy-to-use and powerful NLP library with 🤗 Awesome model zoo, supporting wide-range of NLP tasks from research to industrial applications, including 🗂Text Classification, 🔍 Neural Search, ❓ Question Answering, ℹ️ Information Extraction, 📄 Document Intelligence, 💌 Sentiment Analysis and 🖼 Diffusion AIGC system etc.Project mention: The 10 Trending Python Repositories on GitHub (May 2022) | dev.to | 2022-06-23
Official Stanford NLP Python Library for Many Human LanguagesProject mention: Off the shelf sentence parsers? | reddit.com/r/LanguageTechnology | 2022-08-26
stanza has a constituency parser. There's a model compatible with the dev branch with an accuracy of 95.8 on PTB, using Roberta as a bottom layer, so it's pretty decent at this point. (The currently released model is not as accurate, but it's easy to get the better model to you.) There's also Tregex as a Java addon which can very easily search for a noun phrase highest up in the tree: NP !>> NP will search for a noun phrase which is not dominated by any higher up noun phrase.
:mag: Haystack is an open source NLP framework that leverages pre-trained Transformer models. It enables developers to quickly implement production-ready semantic search, question answering, summarization and document ranking for a wide range of NLP applications.Project mention: New free tool that uses fine-tuned BERT model to surface answers from research papers | reddit.com/r/LanguageTechnology | 2022-10-28
Some cool tools like HayStack that would be useful in putting some of these together.
Mycroft Core, the Mycroft Artificial Intelligence platform.Project mention: (Rhetorical question) How the hell do NTs do it all? | reddit.com/r/AutisticWithADHD | 2022-11-23
The project is called Mycroft, and while they have their own smart speakers for sale, they also provide software so that you can make your own.
Chinese version of GPT2 training code, using BERT tokenizer.
Official implementations for various pre-training models of ERNIE-family, covering topics of Language Understanding & Generation, Multimodal Understanding & Generation, and beyond.Project mention: ERNIE - ViLG 2.0 by Baidu | reddit.com/r/singularity | 2022-10-31
The context switching struggle is real. Zigi makes context switching a thing of the past. It monitors Jira and GitHub updates, pings you when PRs need approval and lets you take fast actions - all directly from Slack!
Python NLP related posts
How to Prompt Design? Share resources
2 projects | reddit.com/r/GPT3 | 25 Nov 2022
Hacker News top posts: Nov 24, 2022
3 projects | reddit.com/r/hackerdigest | 24 Nov 2022
Querying for similarity of indexed documents.
1 project | reddit.com/r/elasticsearch | 24 Nov 2022
GitHub - Acreom/quickadd: Parse natural language time and date expressions in python
1 project | reddit.com/r/Python | 24 Nov 2022
Has anyone here ever used the seaNMF model for short text topic modeling, and be willing to help me get started with it?
4 projects | reddit.com/r/LanguageTechnology | 24 Nov 2022
(Rhetorical question) How the hell do NTs do it all?
1 project | reddit.com/r/AutisticWithADHD | 23 Nov 2022
Someone has to say it: Voice assistants are not doing it for big tech
2 projects | news.ycombinator.com | 23 Nov 2022
A note from our sponsor - Sonar
www.sonarsource.com | 27 Nov 2022
What are some of the best open-source NLP projects in Python? This list will help you: