Python Natural Language Processing

Open-source Python projects categorized as Natural Language Processing

Top 23 Python Natural Language Processing Projects

Natural Language Processing
  1. transformers

    🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.

    Project mention: QwQ-32B: Embracing the Power of Reinforcement Learning | news.ycombinator.com | 2025-03-05

    Huggingface's transformers library supports something similar to this. You set a minimum length, and until that length is reached, the end of sequence token has no chance of being output.

    https://github.com/huggingface/transformers/blob/51ed61e2f05...

    S1 does something similar to put a lower limit on its reasoning output. End of thinking is represented with the <|im_start|> token, followed by the word 'answer'. IIRC the code dynamically adds/removes <|im_start|> to the list of suppressed tokens.

    Both of these approaches set the probability to zero, not something small like you were suggesting.

  2. CodeRabbit

    CodeRabbit: AI Code Reviews for Developers. Revolutionize your code reviews with AI. CodeRabbit offers PR summaries, code walkthroughs, 1-click suggestions, and AST-based analysis. Boost productivity and code quality across all major languages with each PR.

    CodeRabbit logo
  3. funNLP

    中英文敏感词、语言检测、中外手机/电话归属地/运营商查询、名字推断性别、手机号抽取、身份证抽取、邮箱抽取、中日文人名库、中文缩写库、拆字词典、词汇情感值、停用词、反动词表、暴恐词表、繁简体转换、英文模拟中文发音、汪峰歌词生成器、职业名称词库、同义词库、反义词库、否定词库、汽车品牌词库、汽车零件词库、连续英文切割、各种中文词向量、公司名字大全、古诗词库、IT词库、财经词库、成语词库、地名词库、历史名人词库、诗词词库、医学词库、饮食词库、法律词库、汽车词库、动物词库、中文聊天语料、中文谣言数据、百度中文问答数据集、句子相似度匹配算法集合、bert资源、文本生成&摘要相关工具、cocoNLP信息抽取工具、国内电话号码正则匹配、清华大学XLORE:中英文跨语言百科知识图谱、清华大学人工智能技术系列报告、自然语言生成、NLU太难了系列、自动对联数据及机器人、用户名黑名单列表、罪名法务名词及分类模型、微信公众号语料、cs224n深度学习自然语言处理课程、中文手写汉字识别、中文自然语言处理 语料/数据集、变量命名神器、分词语料库+代码、任务型对话英文数据集、ASR 语音数据集 + 基于深度学习的中文

  4. bert

    TensorFlow code and pre-trained models for BERT

    Project mention: A Novel Approach for Text Encryption Using Tokenizers in Ruby | dev.to | 2025-02-06

    Devlin, J., Chang, M.-W., Lee, K., & Toutanova, K. (2019). BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. https://arxiv.org/abs/1810.04805

  5. HanLP

    中文分词 词性标注 命名实体识别 依存句法分析 成分句法分析 语义依存分析 语义角色标注 指代消解 风格转换 语义相似度 新词发现 关键词短语提取 自动摘要 文本分类聚类 拼音简繁转换 自然语言处理

  6. Jieba

    结巴中文分词

    Project mention: Show HN: Mandarin Word Segmenter with Translation | news.ycombinator.com | 2025-02-04

    Thanks for the kind words!

    I'm using Jieba[0] because it hits a nice balance of fast and accurate. But I'm initializing it with a custom dictionary (~800k entries), and have added several layers of heuristic post-segmentation. For example, Jieba tends to split up chengyu into two words, but I've decided they should be displayed as a single word, since chengyu are typically a single entry in dictionaries.

    [0] https://github.com/fxsjy/jieba

  7. spaCy

    💫 Industrial-strength Natural Language Processing (NLP) in Python

    Project mention: SpaCy – Industrial-Strength Natural Language Processing in Python | news.ycombinator.com | 2025-02-09
  8. crewAI

    Framework for orchestrating role-playing, autonomous AI agents. By fostering collaborative intelligence, CrewAI empowers agents to work together seamlessly, tackling complex tasks.

    Project mention: CrewAI – open-source framework for LLM agents | news.ycombinator.com | 2025-02-20
  9. SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

    SaaSHub logo
  10. d2l-en

    Interactive deep learning book with multi-framework code, math, and discussions. Adopted at 500 universities from 70 countries including Stanford, MIT, Harvard, and Cambridge.

  11. NLP-progress

    Repository to track the progress in Natural Language Processing (NLP), including the datasets and the current state-of-the-art for the most common NLP tasks.

  12. datasets

    🤗 The largest hub of ready-to-use datasets for ML models with fast, easy-to-use and efficient data manipulation tools

    Project mention: 20 Open Source Tools I Recommend to Build, Share, and Run AI Projects | dev.to | 2024-11-13

    Datasets library repository for accessing and sharing datasets with the community.

  13. rasa

    💬 Open source machine learning framework to automate text- and voice-based conversations: NLU, dialogue management, connect to Slack, Facebook, and more - Create chatbots and voice assistants

    Project mention: What is Rasa? A Beginner’s Guide to Conversational AI | dev.to | 2024-12-31

    Rasa GitHub Repository

  14. Ciphey

    ⚡ Automatically decrypt encryptions without knowing the key or cipher, decode encodings, and crack hashes ⚡

  15. Qwen

    The official repo of Qwen (通义千问) chat & pretrained large language model proposed by Alibaba Cloud.

    Project mention: Running Qwen, Nearly as Powerful as DeepSeek, on a MacBook Pro | dev.to | 2025-02-05

    Qwen (Qwen GitHub Repository) has been gaining attention recently as a powerful open-source large language model (LLM). I decided to give it a spin on my MacBook Pro using Ollama, a platform designed for running local LLMs. While Qwen2.5-Max boasts the highest performance, my setup could only handle the smaller Qwen2.5 (32B) model. Here's what I found!

  16. gensim

    Topic Modelling for Humans

  17. flair

    A very simple framework for state-of-the-art Natural Language Processing (NLP)

    Project mention: WhisperNER: Unified Open Named Entity and Speech Recognition | news.ycombinator.com | 2024-11-21

    only the last string is a LOC named entity. Of course you can change definitions from the standard ones if you like, but then you should be careful not to compare with tools that use the original standard definition of NER such as flairNLP [1].

    [1] https://github.com/flairNLP/flair?tab=readme-ov-file

  18. NLTK

    NLTK Source

    Project mention: Mastering the Art of Conversational AI: Insights and Implementations with Python | dev.to | 2025-02-12

    We can use NLTK, a powerful library for Python that provides easy-to-use interfaces to over 50 corpora and lexical resources.

  19. MOSS

    An open-source tool-augmented conversational language model from Fudan University

  20. ludwig

    Low-code framework for building custom LLMs, neural networks, and other AI models

    Project mention: Show HN: Toolkit for LLM Fine-Tuning, Ablating and Testing | news.ycombinator.com | 2024-04-07

    This is a great project, little bit similar to https://github.com/ludwig-ai/ludwig, but it includes testing capabilities and ablation.

    questions regarding the LLM testing aspect: How extensive is the test coverage for LLM use cases, and what is the current state of this project area? Do you offer any guarantees, or is it considered an open-ended problem?

    Would love to see more progress toward this area!

  21. LLMSurvey

    The official GitHub page for the survey paper "A Survey of Large Language Models".

    Project mention: Ask HN: Textbook Regarding LLMs | news.ycombinator.com | 2024-03-23

    Here’s another one - it’s older but has some interesting charts and graphs.

    https://arxiv.org/abs/2303.18223

  22. camel

    🐫 CAMEL: Finding the Scaling Law of Agents. The first and the best multi-agent framework. https://www.camel-ai.org

    Project mention: Common Use Cases for CAMEL-AI | dev.to | 2025-03-18

    These use cases show off CAMEL-AI’s knack for teamwork and flexibility. Whether you’re automating, researching, or assisting, it’s got something for you. Ready to try it? Hit up the GitHub repo or chat with us on Discord. What’s your first project gonna be? Let’s make it happen!

  23. doccano

    Open source annotation tool for machine learning practitioners.

  24. TextBlob

    Simple, Pythonic, text processing--Sentiment analysis, part-of-speech tagging, noun phrase extraction, translation, and more.

  25. attention-is-all-you-need-pytorch

    A PyTorch implementation of the Transformer model in "Attention is All You Need".

  26. SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

    SaaSHub logo
NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020).

Python Natural Language Processing discussion

Log in or Post with

Python Natural Language Processing related posts

  • QwQ-32B: Embracing the Power of Reinforcement Learning

    3 projects | news.ycombinator.com | 5 Mar 2025
  • CrewAI – open-source framework for LLM agents

    1 project | news.ycombinator.com | 20 Feb 2025
  • Mastering the Art of Conversational AI: Insights and Implementations with Python

    1 project | dev.to | 12 Feb 2025
  • SpaCy – Industrial-Strength Natural Language Processing in Python

    1 project | news.ycombinator.com | 9 Feb 2025
  • A Novel Approach for Text Encryption Using Tokenizers in Ruby

    1 project | dev.to | 6 Feb 2025
  • Show HN: Mandarin Word Segmenter with Translation

    2 projects | news.ycombinator.com | 4 Feb 2025
  • Building an AI-powered Financial Behavior Analyzer with NodeJS, Python, SvelteKit, and TailwindCSS - Part 1: The AI Service

    1 project | dev.to | 2 Feb 2025
  • A note from our sponsor - SaaSHub
    www.saashub.com | 21 Mar 2025
    SaaSHub helps you find the best software and product alternatives Learn more →

Index

What are some of the best open-source Natural Language Processing projects in Python? This list will help you:

# Project Stars
1 transformers 141,593
2 funNLP 71,798
3 bert 38,864
4 HanLP 34,609
5 Jieba 33,835
6 spaCy 31,139
7 crewAI 28,679
8 d2l-en 25,261
9 NLP-progress 22,817
10 datasets 19,793
11 rasa 19,683
12 Ciphey 18,839
13 Qwen 17,509
14 gensim 15,915
15 flair 14,108
16 NLTK 13,910
17 MOSS 12,033
18 ludwig 11,380
19 LLMSurvey 11,209
20 camel 10,760
21 doccano 9,849
22 TextBlob 9,285
23 attention-is-all-you-need-pytorch 9,067

Sponsored
CodeRabbit: AI Code Reviews for Developers
Revolutionize your code reviews with AI. CodeRabbit offers PR summaries, code walkthroughs, 1-click suggestions, and AST-based analysis. Boost productivity and code quality across all major languages with each PR.
coderabbit.ai

Did you know that Python is
the 2nd most popular programming language
based on number of references?