Python NLP

Open-source Python projects categorized as NLP

Top 23 Python NLP Projects

  1. transformers

    🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.

    Project mention: None of the top 10 projects in GitHub is actually a software project 🤯 | dev.to | 2025-05-10

    We see an addition to the AI community with AutoGPT. Along with Tensorflow they represent the AI community in the software category, which is getting relevant (2 out of 8). We can expect in the future to have new AI projects in the top 25 such as Transformers or Ollama (currently top 34 and 36, respectively).

  2. InfluxDB

    InfluxDB – Built for High-Performance Time Series Workloads. InfluxDB 3 OSS is now GA. Transform, enrich, and act on time series data directly in the database. Automate critical tasks and eliminate the need to move data externally. Download now.

    InfluxDB logo
  3. ailearning

    AiLearning:数据分析+机器学习实战+线性代数+PyTorch+NLTK+TF2

    Project mention: Top Github repositories for 10+ programming languages | dev.to | 2024-07-16

    Ai learning

  4. bert

    TensorFlow code and pre-trained models for BERT

    Project mention: Complete Large Language Model (LLM) Learning Roadmap | dev.to | 2025-04-11

    Resource: BERT Paper

  5. HanLP

    Natural Language Processing for the next decade. Tokenization, Part-of-Speech Tagging, Named Entity Recognition, Syntactic & Semantic Dependency Parsing, Document Classification

  6. spaCy

    💫 Industrial-strength Natural Language Processing (NLP) in Python

    Project mention: 15,000 lines of verified cryptography now in Python | news.ycombinator.com | 2025-04-18

    Geez honestly

    This seems to be the issue https://github.com/explosion/spaCy/issues/13658#issuecomment...

    And you depend on opinionated libraries that break with newer versions. Why? Well because f you that's why! Because our library is not just a tool, it's a lifestyle

    Though it seems that Pydantic 1x does support 3.13 https://docs.pydantic.dev/1.10/changelog/#v11020-2025-01-07

  7. storm

    An LLM-powered knowledge curation system that researches a topic and generates a full-length report with citations.

    Project mention: Code Explanation: "STORM: Synthesis of Topic Outlines through Retrieval and Multi-perspective Question Asking" | dev.to | 2025-03-08

    Note: this explanation only covers the knowledge_storm in the storm repo because it aligns with my interests.

  8. unilm

    Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities

    Project mention: A Picture Is Worth 170 Tokens: How Does GPT-4o Encode Images? | news.ycombinator.com | 2024-06-07

    Has anyone tried Kosmos [0] ? I came across it the other day and it looked shiny and interesting, but I haven't had a chance to put it to the test much yet.

    [0] - https://github.com/microsoft/unilm/tree/master/kosmos-2.5

  9. SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

    SaaSHub logo
  10. haystack

    AI orchestration framework to build customizable, production-ready LLM applications. Connect components (models, vector DBs, file converters) to pipelines or agents that can interact with your data. With advanced retrieval methods, it's best suited for building RAG, question answering, semantic search or conversational agent chatbots.

    Project mention: Building a Prompt-Based Crypto Trading Platform with RAG and Reddit Sentiment Analysis using Haystack | dev.to | 2025-04-28

    Haystack forms the backbone of our RAG system. It provides pipelines for processing documents, embedding text, and retrieving relevant information.

  11. rasa

    💬 Open source machine learning framework to automate text- and voice-based conversations: NLU, dialogue management, connect to Slack, Facebook, and more - Create chatbots and voice assistants

    Project mention: What is Rasa? A Beginner’s Guide to Conversational AI | dev.to | 2024-12-31

    Rasa GitHub Repository

  12. datasets

    🤗 The largest hub of ready-to-use datasets for ML models with fast, easy-to-use and efficient data manipulation tools

    Project mention: 20 Open Source Tools I Recommend to Build, Share, and Run AI Projects | dev.to | 2024-11-13

    Datasets library repository for accessing and sharing datasets with the community.

  13. best-of-ml-python

    🏆 A ranked list of awesome machine learning Python libraries. Updated weekly.

    Project mention: A ranked list of machine learning Python libraries. Updated weekly | news.ycombinator.com | 2025-01-31
  14. Chinese-LLaMA-Alpaca

    中文LLaMA&Alpaca大语言模型+本地CPU/GPU训练部署 (Chinese LLaMA & Alpaca LLMs)

  15. gensim

    Topic Modelling for Humans

  16. flair

    A very simple framework for state-of-the-art Natural Language Processing (NLP)

    Project mention: WhisperNER: Unified Open Named Entity and Speech Recognition | news.ycombinator.com | 2024-11-21

    only the last string is a LOC named entity. Of course you can change definitions from the standard ones if you like, but then you should be careful not to compare with tools that use the original standard definition of NER such as flairNLP [1].

    [1] https://github.com/flairNLP/flair?tab=readme-ov-file

  17. NLTK

    NLTK Source

    Project mention: Mastering the Art of Conversational AI: Insights and Implementations with Python | dev.to | 2025-02-12

    We can use NLTK, a powerful library for Python that provides easy-to-use interfaces to over 50 corpora and lexical resources.

  18. PaddleHub

    Awesome pre-trained models toolkit based on PaddlePaddle. (400+ models including Image, Text, Audio, Video and Cross-Modal with Easy Inference & Serving)

  19. PaddleNLP

    Easy-to-use and powerful LLM and SLM library with awesome model zoo.

  20. txtai

    💡 All-in-one open-source AI framework for semantic search, LLM orchestration and language model workflows

    Project mention: Chunking your data for RAG | dev.to | 2025-02-11
  21. text-generation-inference

    Large Language Model Text Generation Inference

    Project mention: Complete Large Language Model (LLM) Learning Roadmap | dev.to | 2025-04-11

    Resource: TGI (Text Generation Inference)

  22. petals

    🌸 Run LLMs at home, BitTorrent-style. Fine-tuning and inference up to 10x faster than offloading

    Project mention: Serving AI from the Basement – 192GB of VRAM Setup | news.ycombinator.com | 2024-09-08
  23. TextBlob

    Simple, Pythonic, text processing--Sentiment analysis, part-of-speech tagging, noun phrase extraction, translation, and more.

  24. attention-is-all-you-need-pytorch

    A PyTorch implementation of the Transformer model in "Attention is All You Need".

  25. modelscope

    ModelScope: bring the notion of Model-as-a-Service to life.

  26. SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

    SaaSHub logo
NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020).

Python NLP discussion

Log in or Post with

Python NLP related posts

  • How to Install Foundation-Sec 8B by Cisco: The Ultimate Cybersecurity AI Model

    1 project | dev.to | 6 May 2025
  • How to Install Qwen2.5-Omni 3B Locally

    1 project | dev.to | 3 May 2025
  • Making Sure AI Agents Play Nice: A Look at How We Evaluate Them

    6 projects | dev.to | 1 May 2025
  • Are LLMs Random?

    1 project | news.ycombinator.com | 30 Apr 2025
  • Building a Prompt-Based Crypto Trading Platform with RAG and Reddit Sentiment Analysis using Haystack

    1 project | dev.to | 28 Apr 2025
  • Llama 4 Smells Bad

    4 projects | news.ycombinator.com | 24 Apr 2025
  • Show HN: A Medical Research Agent Built with BioMCP and Haystack

    3 projects | news.ycombinator.com | 21 Apr 2025
  • A note from our sponsor - InfluxDB
    www.influxdata.com | 19 May 2025
    InfluxDB 3 OSS is now GA. Transform, enrich, and act on time series data directly in the database. Automate critical tasks and eliminate the need to move data externally. Download now. Learn more →

Index

What are some of the best open-source NLP projects in Python? This list will help you:

# Project Stars
1 transformers 144,375
2 ailearning 40,779
3 bert 39,124
4 HanLP 35,016
5 spaCy 31,576
6 storm 24,288
7 unilm 21,230
8 haystack 20,709
9 rasa 20,134
10 datasets 20,125
11 best-of-ml-python 20,054
12 Chinese-LLaMA-Alpaca 18,816
13 gensim 16,017
14 flair 14,167
15 NLTK 14,041
16 PaddleHub 12,868
17 PaddleNLP 12,589
18 txtai 10,935
19 text-generation-inference 10,128
20 petals 9,619
21 TextBlob 9,337
22 attention-is-all-you-need-pytorch 9,130
23 modelscope 7,867

Sponsored
InfluxDB – Built for High-Performance Time Series Workloads
InfluxDB 3 OSS is now GA. Transform, enrich, and act on time series data directly in the database. Automate critical tasks and eliminate the need to move data externally. Download now.
www.influxdata.com

Did you know that Python is
the 2nd most popular programming language
based on number of references?