SaaSHub helps you find the best software and product alternatives Learn more →
Top 23 NLP Open-Source Projects
-
transformers
🤗 Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training.
Project mention: The $100 ChatGPT: Why Karpathy's nanochat Represnts the Next Big Thing | dev.to | 2026-05-04Hugging Face Transformers: 500,000+ lines
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
-
Project mention: Llama-Factory: Unified, Efficient Fine-Tuning for 100 Open LLMs | news.ycombinator.com | 2025-09-18
-
-
-
Project mention: BettaFish – Public Opinion Sentiment Analysis Model | news.ycombinator.com | 2025-11-03
-
langextract
A Python library for extracting structured information from unstructured text using LLMs with precise source grounding and interactive visualization.
langextract: A tool for extracting language information. View on GitHub
-
HanLP
Natural Language Processing for the next decade. Tokenization, Part-of-Speech Tagging, Named Entity Recognition, Syntactic & Semantic Dependency Parsing, Document Classification
-
500-AI-Machine-learning-Deep-learning-Computer-vision-NLP-Projects-with-code
500 AI Machine learning Deep learning Computer vision NLP Projects with code
-
We use spaCy’s en_core_web_lg (Large) model as the underlying NLP engine. This gives the Redactor the linguistic context to understand that "Gatsby" in a book title should stay, but "Gatsby" mentioned as a person's name in a private letter might need to go.
-
storm
An LLM-powered knowledge curation system that researches a topic and generates a full-length report with citations.
-
haystack
Open-source AI orchestration framework for building context-engineered, production-ready LLM applications. Design modular pipelines and agent workflows with explicit control over retrieval, routing, memory, and generation. Built for scalable agents, RAG, multimodal applications, semantic search, and conversational systems.
Project mention: Show HN: Haystack – Review pull requests like you wrote them yourself | news.ycombinator.com | 2025-09-11I immediately thought this was an update by Deepset and their Haystack framework. https://haystack.deepset.ai/
Just FYI.
-
-
Project mention: How Attention Sinks Keep Language Models Stable | news.ycombinator.com | 2025-08-08
I found a fairly large improvement in my toy transformer model where I added a "global" token akin to the CLS token in ViT.
Another approach I've seen is the "Diff transformer" from MS Research (https://github.com/microsoft/unilm/tree/master/Diff-Transfor...).
-
datasets
🤗 The largest hub of ready-to-use datasets for AI models with fast, easy-to-use and efficient data manipulation tools
Project mention: GSoC 2026 Predictions: 30 NEW AI/ML/Security Organizations You Should Start Contributing to NOW! | dev.to | 2026-02-06 -
rasa
💬 Open source machine learning framework to automate text- and voice-based conversations: NLU, dialogue management, connect to Slack, Facebook, and more - Create chatbots and voice assistants
Project mention: Eliza Reanimated Published in IEEE Annals of the History of Computing | news.ycombinator.com | 2025-06-20Right before LLMs broke into the scene we had a few techniques I was aware of:
* Personality Forge uses a rules-based scripting approach [0]. This is basically ELIZA extended to take advantage of modern processing power.
* Rasa [1] used traditional NLP/NLU techniques and small-model ML to match intents and parse user requests. This is the same kind of tooling that Google/Alexa historically used, just without the voice layer and with more effort to keep the context in mind.
Rasa is actually open source [2], so you can poke around the internals to see how it's implemented. It doesn't look like it's changed architecture substantially since the pre-LLM days. Rhasspy [3] (also open source) uses similar techniques but in the voice assistant space rather than as a full chatbot.
[0] https://www.personalityforge.com/developers/how-to-build-cha...
[1] https://web.archive.org/web/20200801000000*/https://rasa.com... (old link because Rasa's marketing today is ambiguous about whether they're adding LLMs now).
[2] https://github.com/RasaHQ/rasa
[3] https://rhasspy.readthedocs.io/en/latest/
-
FinGPT
FinGPT: Open-Source Financial Large Language Models! Revolutionize 🔥 We release the trained model on HuggingFace.
-
-
-
-
Awesome-pytorch-list
A comprehensive list of pytorch related content on github,such as different models,implementations,helper libraries,tutorials etc.
-
Project mention: How to Analyze 47 Million Hacker News Posts: A Data Scientist's Dream Dataset Just Got Better | dev.to | 2026-03-18
For more advanced topic modeling, consider using tools like scikit-learn's LatentDirichletAllocation or Gensim.
-
memvid
Memory layer for AI Agents. Replace complex RAG pipelines with a serverless, single-file memory layer. Give your agents instant retrieval and long-term memory.
Project mention: Show HN: I accidentally built "SQLite for AI memory" (Memvid) | news.ycombinator.com | 2026-01-05 -
NLP discussion
NLP related posts
-
📄Paper: RORA-VLM: Robust Retrieval Augmentation for Vision Language Models
-
The Sovereign Redactor — A Precision-Guided Privacy Airlock
-
When Is "Next Friday"?
-
Show HN: AutoML Agents
-
I found a bug that made my LLM look 14x better than it was — here's what I learned
-
Smile v6.0 Was Released
-
Submitted fix to Hugging Face and was mocked, but my responses need more insight
-
A note from our sponsor - SaaSHub
www.saashub.com | 7 Jun 2026
Index
What are some of the best open-source NLP projects? This list will help you:
| # | Project | Stars |
|---|---|---|
| 1 | transformers | 161,343 |
| 2 | LlamaFactory | 71,870 |
| 3 | AI-For-Beginners | 47,979 |
| 4 | ailearning | 42,255 |
| 5 | BettaFish | 41,211 |
| 6 | langextract | 36,808 |
| 7 | HanLP | 36,346 |
| 8 | 500-AI-Machine-learning-Deep-learning-Computer-vision-NLP-Projects-with-code | 34,165 |
| 9 | spaCy | 33,632 |
| 10 | storm | 28,323 |
| 11 | haystack | 25,466 |
| 12 | best-of-ml-python | 23,620 |
| 13 | unilm | 22,141 |
| 14 | datasets | 21,586 |
| 15 | rasa | 21,197 |
| 16 | FinGPT | 20,392 |
| 17 | Chinese-LLaMA-Alpaca | 18,949 |
| 18 | awesome-nlp | 18,684 |
| 19 | ML-YouTube-Courses | 17,147 |
| 20 | Awesome-pytorch-list | 16,517 |
| 21 | gensim | 16,430 |
| 22 | memvid | 15,621 |
| 23 | nlp-tutorial | 14,896 |