NLP

Top 23 NLP Open-Source Projects

  1. transformers

    🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.

    Project mention: None of the top 10 projects in GitHub is actually a software project 🤯 | dev.to | 2025-05-10

    We see an addition to the AI community with AutoGPT. Along with Tensorflow they represent the AI community in the software category, which is getting relevant (2 out of 8). We can expect in the future to have new AI projects in the top 25 such as Transformers or Ollama (currently top 34 and 36, respectively).

  2. InfluxDB

    InfluxDB – Built for High-Performance Time Series Workloads. InfluxDB 3 OSS is now GA. Transform, enrich, and act on time series data directly in the database. Automate critical tasks and eliminate the need to move data externally. Download now.

    InfluxDB logo
  3. ragflow

    RAGFlow is an open-source RAG (Retrieval-Augmented Generation) engine based on deep document understanding.

    Project mention: 7 AI Open Source Libraries To Build RAG, Agents & AI Search | dev.to | 2024-11-14

    ⭐️ RAG Flow on GitHub

  4. ailearning

    AiLearning:数据分析+机器学习实战+线性代数+PyTorch+NLTK+TF2

    Project mention: Top Github repositories for 10+ programming languages | dev.to | 2024-07-16

    Ai learning

  5. bert

    TensorFlow code and pre-trained models for BERT

    Project mention: Complete Large Language Model (LLM) Learning Roadmap | dev.to | 2025-04-11

    Resource: BERT Paper

  6. AI-For-Beginners

    12 Weeks, 24 Lessons, AI for All!

    Project mention: The Top 9️⃣ Repositories to learn Python programming + Resources (Extra) 🤯 | dev.to | 2024-11-06

    ⭐️ AI For Beginners on GitHub.

  7. HanLP

    Natural Language Processing for the next decade. Tokenization, Part-of-Speech Tagging, Named Entity Recognition, Syntactic & Semantic Dependency Parsing, Document Classification

  8. spaCy

    💫 Industrial-strength Natural Language Processing (NLP) in Python

    Project mention: 15,000 lines of verified cryptography now in Python | news.ycombinator.com | 2025-04-18

    Geez honestly

    This seems to be the issue https://github.com/explosion/spaCy/issues/13658#issuecomment...

    And you depend on opinionated libraries that break with newer versions. Why? Well because f you that's why! Because our library is not just a tool, it's a lifestyle

    Though it seems that Pydantic 1x does support 3.13 https://docs.pydantic.dev/1.10/changelog/#v11020-2025-01-07

  9. SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

    SaaSHub logo
  10. storm

    An LLM-powered knowledge curation system that researches a topic and generates a full-length report with citations.

    Project mention: Code Explanation: "STORM: Synthesis of Topic Outlines through Retrieval and Multi-perspective Question Asking" | dev.to | 2025-03-08

    Note: this explanation only covers the knowledge_storm in the storm repo because it aligns with my interests.

  11. 500-AI-Machine-learning-Deep-learning-Computer-vision-NLP-Projects-with-code

    500 AI Machine learning Deep learning Computer vision NLP Projects with code

    Project mention: Top Github repositories for 10+ programming languages | dev.to | 2024-07-16

    500 AI machine learning NLP programming projects

  12. unilm

    Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities

    Project mention: A Picture Is Worth 170 Tokens: How Does GPT-4o Encode Images? | news.ycombinator.com | 2024-06-07

    Has anyone tried Kosmos [0] ? I came across it the other day and it looked shiny and interesting, but I haven't had a chance to put it to the test much yet.

    [0] - https://github.com/microsoft/unilm/tree/master/kosmos-2.5

  13. haystack

    AI orchestration framework to build customizable, production-ready LLM applications. Connect components (models, vector DBs, file converters) to pipelines or agents that can interact with your data. With advanced retrieval methods, it's best suited for building RAG, question answering, semantic search or conversational agent chatbots.

    Project mention: Building a Prompt-Based Crypto Trading Platform with RAG and Reddit Sentiment Analysis using Haystack | dev.to | 2025-04-28

    Haystack forms the backbone of our RAG system. It provides pipelines for processing documents, embedding text, and retrieving relevant information.

  14. rasa

    💬 Open source machine learning framework to automate text- and voice-based conversations: NLU, dialogue management, connect to Slack, Facebook, and more - Create chatbots and voice assistants

    Project mention: What is Rasa? A Beginner’s Guide to Conversational AI | dev.to | 2024-12-31

    Rasa GitHub Repository

  15. datasets

    🤗 The largest hub of ready-to-use datasets for ML models with fast, easy-to-use and efficient data manipulation tools

    Project mention: 20 Open Source Tools I Recommend to Build, Share, and Run AI Projects | dev.to | 2024-11-13

    Datasets library repository for accessing and sharing datasets with the community.

  16. best-of-ml-python

    🏆 A ranked list of awesome machine learning Python libraries. Updated weekly.

    Project mention: A ranked list of machine learning Python libraries. Updated weekly | news.ycombinator.com | 2025-01-31
  17. Chinese-LLaMA-Alpaca

    中文LLaMA&Alpaca大语言模型+本地CPU/GPU训练部署 (Chinese LLaMA & Alpaca LLMs)

  18. awesome-nlp

    :book: A curated list of resources dedicated to Natural Language Processing (NLP)

  19. ML-YouTube-Courses

    📺 Discover the latest machine learning / AI courses on YouTube.

    Project mention: Top Github repositories for 10+ programming languages | dev.to | 2024-07-16

    Machine Learning Youtube courses

  20. FinGPT

    FinGPT: Open-Source Financial Large Language Models! Revolutionize 🔥 We release the trained model on HuggingFace.

    Project mention: About FinGPT: Open-Source Financial Large Language Models | news.ycombinator.com | 2024-08-28
  21. gensim

    Topic Modelling for Humans

  22. Awesome-pytorch-list

    A comprehensive list of pytorch related content on github,such as different models,implementations,helper libraries,tutorials etc.

  23. nlp-tutorial

    Natural Language Processing Tutorial for Deep Learning Researchers

  24. DeepLearningExamples

    State-of-the-Art Deep Learning scripts organized by models - easy to train and deploy with reproducible accuracy and performance on enterprise-grade infrastructure.

  25. flair

    A very simple framework for state-of-the-art Natural Language Processing (NLP)

    Project mention: WhisperNER: Unified Open Named Entity and Speech Recognition | news.ycombinator.com | 2024-11-21

    only the last string is a LOC named entity. Of course you can change definitions from the standard ones if you like, but then you should be careful not to compare with tools that use the original standard definition of NER such as flairNLP [1].

    [1] https://github.com/flairNLP/flair?tab=readme-ov-file

  26. SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

    SaaSHub logo
NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020).

NLP discussion

Log in or Post with

NLP related posts

  • VerifAI – open-source generative search with verification

    1 project | news.ycombinator.com | 8 May 2025
  • How to Install Foundation-Sec 8B by Cisco: The Ultimate Cybersecurity AI Model

    1 project | dev.to | 6 May 2025
  • Routr: Fast local replacement for DuckDuckGo bangs

    5 projects | news.ycombinator.com | 4 May 2025
  • How to Install Qwen2.5-Omni 3B Locally

    1 project | dev.to | 3 May 2025
  • Making Sure AI Agents Play Nice: A Look at How We Evaluate Them

    6 projects | dev.to | 1 May 2025
  • Are LLMs Random?

    1 project | news.ycombinator.com | 30 Apr 2025
  • Building a Prompt-Based Crypto Trading Platform with RAG and Reddit Sentiment Analysis using Haystack

    1 project | dev.to | 28 Apr 2025
  • A note from our sponsor - SaaSHub
    www.saashub.com | 16 May 2025
    SaaSHub helps you find the best software and product alternatives Learn more →

Index

What are some of the best open-source NLP projects? This list will help you:

# Project Stars
1 transformers 144,375
2 ragflow 52,039
3 ailearning 40,779
4 bert 39,124
5 AI-For-Beginners 37,414
6 HanLP 35,016
7 spaCy 31,537
8 storm 24,288
9 500-AI-Machine-learning-Deep-learning-Computer-vision-NLP-Projects-with-code 23,257
10 unilm 21,230
11 haystack 20,709
12 rasa 20,134
13 datasets 20,090
14 best-of-ml-python 20,028
15 Chinese-LLaMA-Alpaca 18,816
16 awesome-nlp 17,147
17 ML-YouTube-Courses 16,489
18 FinGPT 16,110
19 gensim 16,017
20 Awesome-pytorch-list 15,807
21 nlp-tutorial 14,437
22 DeepLearningExamples 14,253
23 flair 14,160

Sponsored
InfluxDB – Built for High-Performance Time Series Workloads
InfluxDB 3 OSS is now GA. Transform, enrich, and act on time series data directly in the database. Automate critical tasks and eliminate the need to move data externally. Download now.
www.influxdata.com