Python nlp-machine-learning

Open-source Python projects categorized as nlp-machine-learning

Top 23 Python nlp-machine-learning Projects

  • NeMo

    NeMo: a toolkit for conversational AI

    Project mention: [P] Making a TTS voice, HK-47 from Kotor using Tortoise (Ideally WaveRNN) | /r/MachineLearning | 2023-07-06

    I don't test WaveRNN but from the ones that I know the best that is open source is FastPitch. And it's easy to use, here is the tutorial for voice cloning.

  • OpenPrompt

    An Open-Source Framework for Prompt-Learning.

  • Sonar

    Write Clean Python Code. Always.. Sonar helps you commit clean code every time. With over 225 unique rules to find Python bugs, code smells & vulnerabilities, Sonar finds the issues while you focus on the work.

  • tika-python

    Tika-Python is a Python binding to the Apache Tika™ REST services allowing Tika to be called natively in the Python community.

  • contextualized-topic-models

    A python package to run contextualized topic modeling. CTMs combine contextualized embeddings (e.g., BERT) with topic models to get coherent topics. Published at EACL and ACL 2021.

    Project mention: [Project]Topic modelling of tweets from the same user | /r/MachineLearning | 2023-04-14

    In our experiments, CTM works well with tweets: (I'm one of the authors)

  • skweak

    skweak: A software toolkit for weak supervision applied to NLP tasks

    Project mention: Entity Extraction with Predefined List | /r/LanguageTechnology | 2023-01-07

    Thanks for pointing me in the right direction. Seems like there’s a few other approaches with weak supervision:

  • babyai

    BabyAI platform. A testbed for training agents to understand and execute language commands.

    Project mention: RL Environment with varying levels of difficulty | /r/reinforcementlearning | 2022-11-05

    Try BabyAI:

  • Revelo Payroll

    Free Global Payroll designed for tech teams. Building a great tech team takes more than a paycheck. Zero payroll costs, get AI-driven insights to retain best talent, and delight them with amazing local benefits. 100% free and compliant.

  • searchGPT

    Grounded search engine (i.e. with source reference) based on LLM / ChatGPT / OpenAI API. It supports web search, file content search etc.

    Project mention: Interesting open source project. Adds bing search to gpt-3.5. maybe a future extension? | /r/Oobabooga | 2023-03-21
  • LLM-Finetuning-Hub

    Repository that contains LLM fine-tuning and deployment scripts along with our research findings.

    Project mention: FLaNK Stack Weekly for 12 September 2023 | | 2023-09-12
  • LemmInflect

    A python module for English lemmatization and inflection.

  • Writing-Styles-Classification-Using-Stylometric-Analysis

    ✍️ An intelligent system that takes a document and classifies different writing styles within the document using stylometric techniques.

    Project mention: Models for spam detection on short messages with both text and numerical inputs | /r/LanguageTechnology | 2022-12-05

    Or look into stylistic features and add them to xgboost classifier ( - I used those combined with BERT last hidden state for fake news classification and got the best results so far - here repo if you wish to get some inspirations: , here some more inspirations for stylistic features) to enhance the results.

  • NLP-Guide

    Natural Language Processing (NLP). Covering topics such as Tokenization, Part Of Speech tagging (POS), Machine translation, Named Entity Recognition (NER), Classification, and Sentiment analysis.

  • rakun2

    RaKUn 2.0 - A fast keyword detection algorithm

    Project mention: Very fast graph-based keyword extraction | /r/LanguageTechnology | 2022-10-30
  • MLH-Quizzet

    This is a smart Quiz Generator that generates a dynamic quiz from any uploaded text/PDF document using NLP. This can be used for self-analysis, question paper generation, and evaluation, thus reducing human effort.

  • BERT-Transformer-Pytorch

    Basic implementation of BERT and Transformer in Pytorch in one short python file (also includes "predict next word" GPT task)

  • YTRecap

    Summarize any youtube video in seconds ✍️

    Project mention: YTRecap (looking for collaborators/contributors) | /r/OpenAI | 2023-04-09


  • texta

    Terminology EXtraction and Text Analytics (TEXTA) Toolkit

  • gmail-assist

    Get control of your overflowing inbox using GPT-3 to classify your emails by importance.

    Project mention: GPT-3 has finally wrangled my gmail inbox. | /r/ArtificialInteligence | 2023-03-22

    I've shared my script here if anyone wants to use it:

  • modsysML

    Model management toolkit for continuous model improvement. Evaluate and compare LLM outputs, test quality, catch regressions and automate.

    Project mention: Prompt engineering framework to test the quality of AI providers and automate data analytics | /r/programming | 2023-07-11
  • Text-Summarization-using-NLP

    Text Summarization using NLP to fetch BBC News Article and summarize its text and also it includes custom article Summarization

  • twitter-stock-sentiment

    Use twitter to get live and trending stock sentiment!

  • google-local-results-ai-server

    A server code for serving BERT-based models for text classification. It is designed by SerpApi for heavy-load prototyping and production tasks, specifically for the implementation of the google-local-results-ai-parser gem.

    Project mention: Show HN: Open-Source Server Code for Deploying Text Classification Models | | 2023-06-28
  • NLP-Model-for-Corpus-Similarity

    A NLP model I developed to determine the similarity or relation between two documents/Wikipedia articles. Inspired by the cosine similarity algorithm and built from WordNet.

  • Onboard AI

    Learn any GitHub repo in 59 seconds. Onboard AI learns any GitHub repo in minutes and lets you chat with it to locate functionality, understand different parts, and generate new code. Use it for free at

NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020). The latest post mention was on 2023-09-12.

Python nlp-machine-learning related posts


What are some of the best open-source nlp-machine-learning projects in Python? This list will help you:

Project Stars
1 NeMo 7,995
2 OpenPrompt 3,777
3 tika-python 1,344
4 contextualized-topic-models 1,106
5 skweak 890
6 babyai 627
7 dr-doc-search 592
8 searchGPT 521
9 LLM-Finetuning-Hub 368
10 LemmInflect 228
11 Writing-Styles-Classification-Using-Stylometric-Analysis 84
12 NLP-Guide 58
13 rakun2 55
14 MLH-Quizzet 51
15 BERT-Transformer-Pytorch 35
16 YTRecap 34
17 texta 32
18 gmail-assist 32
19 modsysML 32
20 Text-Summarization-using-NLP 29
21 twitter-stock-sentiment 11
22 google-local-results-ai-server 11
23 NLP-Model-for-Corpus-Similarity 9
Collect and Analyze Billions of Data Points in Real Time
Manage all types of time series data in a single, purpose-built database. Run at any scale in any environment in the cloud, on-premises, or at the edge.