NLP

Top 23 NLP Open-Source Projects

  1. transformers

    🤗 Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training.

    Project mention: The $100 ChatGPT: Why Karpathy's nanochat Represnts the Next Big Thing | dev.to | 2026-05-04

    Hugging Face Transformers: 500,000+ lines

  2. SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

    SaaSHub logo
  3. LlamaFactory

    Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)

    Project mention: Llama-Factory: Unified, Efficient Fine-Tuning for 100 Open LLMs | news.ycombinator.com | 2025-09-18
  4. AI-For-Beginners

    12 Weeks, 24 Lessons, AI for All!

    Project mention: AI for Beginners | news.ycombinator.com | 2025-11-15
  5. ailearning

    AiLearning:数据分析+机器学习实战+线性代数+PyTorch+NLTK+TF2

  6. BettaFish

    微舆:人人可用的多Agent舆情分析助手,打破信息茧房,还原舆情原貌,预测未来走向,辅助决策!从0实现,不依赖任何框架。

    Project mention: BettaFish – Public Opinion Sentiment Analysis Model | news.ycombinator.com | 2025-11-03
  7. langextract

    A Python library for extracting structured information from unstructured text using LLMs with precise source grounding and interactive visualization.

    Project mention: All Data and AI Weekly #203: 18-Aug-2025 | dev.to | 2025-08-18

    langextract: A tool for extracting language information. View on GitHub

  8. HanLP

    Natural Language Processing for the next decade. Tokenization, Part-of-Speech Tagging, Named Entity Recognition, Syntactic & Semantic Dependency Parsing, Document Classification

  9. 500-AI-Machine-learning-Deep-learning-Computer-vision-NLP-Projects-with-code

    500 AI Machine learning Deep learning Computer vision NLP Projects with code

  10. spaCy

    💫 Industrial-strength Natural Language Processing (NLP) in Python

    Project mention: The Sovereign Redactor — A Precision-Guided Privacy Airlock | dev.to | 2026-05-14

    We use spaCy’s en_core_web_lg (Large) model as the underlying NLP engine. This gives the Redactor the linguistic context to understand that "Gatsby" in a book title should stay, but "Gatsby" mentioned as a person's name in a private letter might need to go.

  11. storm

    An LLM-powered knowledge curation system that researches a topic and generates a full-length report with citations.

  12. haystack

    Open-source AI orchestration framework for building context-engineered, production-ready LLM applications. Design modular pipelines and agent workflows with explicit control over retrieval, routing, memory, and generation. Built for scalable agents, RAG, multimodal applications, semantic search, and conversational systems.

    Project mention: Show HN: Haystack – Review pull requests like you wrote them yourself | news.ycombinator.com | 2025-09-11

    I immediately thought this was an update by Deepset and their Haystack framework. https://haystack.deepset.ai/

    Just FYI.

  13. best-of-ml-python

    🏆 A ranked list of awesome machine learning Python libraries. Updated weekly.

  14. unilm

    Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities

    Project mention: How Attention Sinks Keep Language Models Stable | news.ycombinator.com | 2025-08-08

    I found a fairly large improvement in my toy transformer model where I added a "global" token akin to the CLS token in ViT.

    Another approach I've seen is the "Diff transformer" from MS Research (https://github.com/microsoft/unilm/tree/master/Diff-Transfor...).

  15. datasets

    🤗 The largest hub of ready-to-use datasets for AI models with fast, easy-to-use and efficient data manipulation tools

    Project mention: GSoC 2026 Predictions: 30 NEW AI/ML/Security Organizations You Should Start Contributing to NOW! | dev.to | 2026-02-06
  16. rasa

    💬 Open source machine learning framework to automate text- and voice-based conversations: NLU, dialogue management, connect to Slack, Facebook, and more - Create chatbots and voice assistants

    Project mention: Eliza Reanimated Published in IEEE Annals of the History of Computing | news.ycombinator.com | 2025-06-20

    Right before LLMs broke into the scene we had a few techniques I was aware of:

    * Personality Forge uses a rules-based scripting approach [0]. This is basically ELIZA extended to take advantage of modern processing power.

    * Rasa [1] used traditional NLP/NLU techniques and small-model ML to match intents and parse user requests. This is the same kind of tooling that Google/Alexa historically used, just without the voice layer and with more effort to keep the context in mind.

    Rasa is actually open source [2], so you can poke around the internals to see how it's implemented. It doesn't look like it's changed architecture substantially since the pre-LLM days. Rhasspy [3] (also open source) uses similar techniques but in the voice assistant space rather than as a full chatbot.

    [0] https://www.personalityforge.com/developers/how-to-build-cha...

    [1] https://web.archive.org/web/20200801000000*/https://rasa.com... (old link because Rasa's marketing today is ambiguous about whether they're adding LLMs now).

    [2] https://github.com/RasaHQ/rasa

    [3] https://rhasspy.readthedocs.io/en/latest/

  17. FinGPT

    FinGPT: Open-Source Financial Large Language Models! Revolutionize 🔥 We release the trained model on HuggingFace.

  18. Chinese-LLaMA-Alpaca

    中文LLaMA&Alpaca大语言模型+本地CPU/GPU训练部署 (Chinese LLaMA & Alpaca LLMs)

  19. awesome-nlp

    :book: A curated list of resources dedicated to Natural Language Processing (NLP)

  20. ML-YouTube-Courses

    📺 Discover the latest machine learning / AI courses on YouTube.

  21. Awesome-pytorch-list

    A comprehensive list of pytorch related content on github,such as different models,implementations,helper libraries,tutorials etc.

  22. gensim

    Topic Modelling for Humans

    Project mention: How to Analyze 47 Million Hacker News Posts: A Data Scientist's Dream Dataset Just Got Better | dev.to | 2026-03-18

    For more advanced topic modeling, consider using tools like scikit-learn's LatentDirichletAllocation or Gensim.

  23. memvid

    Memory layer for AI Agents. Replace complex RAG pipelines with a serverless, single-file memory layer. Give your agents instant retrieval and long-term memory.

    Project mention: Show HN: I accidentally built "SQLite for AI memory" (Memvid) | news.ycombinator.com | 2026-01-05
  24. nlp-tutorial

    Natural Language Processing Tutorial for Deep Learning Researchers

NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020).

NLP discussion

Log in or Post with

NLP related posts

  • 📄Paper: RORA-VLM: Robust Retrieval Augmentation for Vision Language Models

    1 project | dev.to | 28 May 2026
  • The Sovereign Redactor — A Precision-Guided Privacy Airlock

    3 projects | dev.to | 14 May 2026
  • When Is "Next Friday"?

    1 project | news.ycombinator.com | 11 May 2026
  • Show HN: AutoML Agents

    1 project | news.ycombinator.com | 5 May 2026
  • I found a bug that made my LLM look 14x better than it was — here's what I learned

    1 project | dev.to | 25 Apr 2026
  • Smile v6.0 Was Released

    1 project | news.ycombinator.com | 22 Apr 2026
  • Submitted fix to Hugging Face and was mocked, but my responses need more insight

    1 project | news.ycombinator.com | 9 Apr 2026
  • A note from our sponsor - SaaSHub
    www.saashub.com | 7 Jun 2026
    SaaSHub helps you find the best software and product alternatives Learn more →

Index

What are some of the best open-source NLP projects? This list will help you:

# Project Stars
1 transformers 161,343
2 LlamaFactory 71,870
3 AI-For-Beginners 47,979
4 ailearning 42,255
5 BettaFish 41,211
6 langextract 36,808
7 HanLP 36,346
8 500-AI-Machine-learning-Deep-learning-Computer-vision-NLP-Projects-with-code 34,165
9 spaCy 33,632
10 storm 28,323
11 haystack 25,466
12 best-of-ml-python 23,620
13 unilm 22,141
14 datasets 21,586
15 rasa 21,197
16 FinGPT 20,392
17 Chinese-LLaMA-Alpaca 18,949
18 awesome-nlp 18,684
19 ML-YouTube-Courses 17,147
20 Awesome-pytorch-list 16,517
21 gensim 16,430
22 memvid 15,621
23 nlp-tutorial 14,896

Sponsored
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com

Did you know that Python is
the 1st most popular programming language
based on number of references?