Python Natural Language Processing

Open-source Python projects categorized as Natural Language Processing

Top 23 Python Natural Language Processing Projects

  • transformers

    🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.

    Project mention: HuggingFace Transformers: Qwen2 | news.ycombinator.com | 2024-01-11
  • funNLP

    中英文敏感词、语言检测、中外手机/电话归属地/运营商查询、名字推断性别、手机号抽取、身份证抽取、邮箱抽取、中日文人名库、中文缩写库、拆字词典、词汇情感值、停用词、反动词表、暴恐词表、繁简体转换、英文模拟中文发音、汪峰歌词生成器、职业名称词库、同义词库、反义词库、否定词库、汽车品牌词库、汽车零件词库、连续英文切割、各种中文词向量、公司名字大全、古诗词库、IT词库、财经词库、成语词库、地名词库、历史名人词库、诗词词库、医学词库、饮食词库、法律词库、汽车词库、动物词库、中文聊天语料、中文谣言数据、百度中文问答数据集、句子相似度匹配算法集合、bert资源、文本生成&摘要相关工具、cocoNLP信息抽取工具、国内电话号码正则匹配、清华大学XLORE:中英文跨语言百科知识图谱、清华大学人工智能技术系列报告、自然语言生成、NLU太难了系列、自动对联数据及机器人、用户名黑名单列表、罪名法务名词及分类模型、微信公众号语料、cs224n深度学习自然语言处理课程、中文手写汉字识别、中文自然语言处理 语料/数据集、变量命名神器、分词语料库+代码、任务型对话英文数据集、ASR 语音数据集 + 基于深度学习的中文

  • WorkOS

    The modern API for authentication & user identity. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.

  • bert

    TensorFlow code and pre-trained models for BERT

    Project mention: OpenAI – Application for US trademark "GPT" has failed | news.ycombinator.com | 2024-02-15

    task-specific parameters, and is trained on the downstream tasks by simply fine-tuning all pre-trained parameters.

    [0] https://arxiv.org/abs/1810.04805

  • Jieba

    结巴中文分词

    Project mention: [OC] How Many Chinese Characters You Need to Learn to Read Chinese! | /r/dataisbeautiful | 2023-06-14

    jieba to do Chinese word segmentation

  • HanLP

    中文分词 词性标注 命名实体识别 依存句法分析 成分句法分析 语义依存分析 语义角色标注 指代消解 风格转换 语义相似度 新词发现 关键词短语提取 自动摘要 文本分类聚类 拼音简繁转换 自然语言处理

  • spaCy

    💫 Industrial-strength Natural Language Processing (NLP) in Python

    Project mention: Best AI SEO Tools for NLP Content Optimization | /r/aitoolsnews | 2023-12-09

    SpaCy: An open-source library providing tools for advanced NLP tasks like tokenization, entity recognition, and part-of-speech tagging.

  • NLP-progress

    Repository to track the progress in Natural Language Processing (NLP), including the datasets and the current state-of-the-art for the most common NLP tasks.

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

  • d2l-en

    Interactive deep learning book with multi-framework code, math, and discussions. Adopted at 500 universities from 70 countries including Stanford, MIT, Harvard, and Cambridge.

    Project mention: which book to chose for deep learning :lan Goodfellow or francois chollet | /r/learnmachinelearning | 2023-04-07
  • datasets

    🤗 The largest hub of ready-to-use datasets for ML models with fast, easy-to-use and efficient data manipulation tools

    Project mention: 🐍🐍 23 issues to grow yourself as an exceptional open-source Python expert 🧑‍💻 🥇 | dev.to | 2023-10-19
  • rasa

    💬 Open source machine learning framework to automate text- and voice-based conversations: NLU, dialogue management, connect to Slack, Facebook, and more - Create chatbots and voice assistants

    Project mention: 🔥🚀 Top 10 Open-Source Must-Have Tools for Crafting Your Own Chatbot 🤖💬 | dev.to | 2023-11-06

    Support Rasa on GitHub ⭐

  • Ciphey

    ⚡ Automatically decrypt encryptions without knowing the key or cipher, decode encodings, and crack hashes ⚡

    Project mention: CyberChef from GCHQ: The Cyber Swiss Army Knife | news.ycombinator.com | 2024-02-01

    I also discovered Ciphey. Neat little tool indeed, but it's being deprecated. It's mentioned in this issue[1] and being replaced with Ares[2]. Neither could decipher this strange encryption[3] I used it on :(

    [1] https://github.com/Ciphey/Ciphey/issues/764

    [2] https://github.com/bee-san/Ares

    [3] "dEFLWWFKQWxRQW16RnkvbTZML0lsdz09" original text is "hacker"

  • gensim

    Topic Modelling for Humans

    Project mention: Aggregating news from different sources | /r/learnprogramming | 2023-07-08
  • DocsGPT

    GPT-powered chat for documentation, chat with your documents

    Project mention: You can earn free shirt by contributing to DocsGPT | /r/hacktoberfest | 2023-10-03
  • flair

    A very simple framework for state-of-the-art Natural Language Processing (NLP)

  • NLTK

    NLTK Source

    Project mention: Building a local AI smart Home Assistant | news.ycombinator.com | 2024-01-13

    alternatively, could we not simply split by common characters such as newlines and periods, to split it within sentences? it would be fragile with special handling required for numbers with decimal points and probably various other edge cases, though.

    there are also Python libraries meant for natural language parsing[0] that could do that task for us. I even see examples on stack overflow[1] that simply split text into sentences.

    [0]: https://www.nltk.org/

  • MOSS

    An open-source tool-augmented conversational language model from Fudan University

    Project mention: Has anyone tried fine tuning on a dataset of complex tasks that require tool use? | /r/LocalLLaMA | 2023-05-05
  • ludwig

    Low-code framework for building custom LLMs, neural networks, and other AI models

  • Qwen

    The official repo of Qwen (通义千问) chat & pretrained large language model proposed by Alibaba Cloud.

    Project mention: What the heck is so great about this model? | /r/SillyTavernAI | 2023-12-07

    Qwen: https://github.com/QwenLM/Qwen

  • TextBlob

    Simple, Pythonic, text processing--Sentiment analysis, part-of-speech tagging, noun phrase extraction, translation, and more.

    Project mention: Using EvaDB to build AI-enhanced apps | dev.to | 2024-01-10

    TextBlob is a Python toolkit for text processing. It offers some common NLP functionalities such as part-of-speech tagging and noun phrase extraction. We’ll use TextBlob in our project to perform some quick sentiment analysis on tweets.

  • doccano

    Open source annotation tool for machine learning practitioners.

    Project mention: You Can't Have a Free Software AI Stack | news.ycombinator.com | 2023-07-13

    Huh?

    I wrote my own system for classifying a stream of texts in Python, I might Open Source it one of these days but I have to get it to the point where it is modular enough that I can customize it to do the particular things I want without subjecting people to my whims... I use it every day and I'm not afraid to demo it because it is rock solid.

    My understanding is that my system would not be hard to adapt to work on images for certain kinds of tasks.

    Pytorch is open source, Huggingface is open source. CUDA isn't. This is

    https://labelstud.io/

    and for annotating text spans there are so many open source tools

    https://github.com/doccano/doccano

    I worked for a company a few years back that built annotation tools for projects we sold to customers but never quite got to a polished general purpose annotator. Today there are an overwhelming number of companies in this space and products I never heard of, many of which are cloud based or paid. Looks like a gold rush to me.

  • Pattern

    Web mining module for Python, with tools for scraping, natural language processing, machine learning, network analysis and visualization.

  • attention-is-all-you-need-pytorch

    A PyTorch implementation of the Transformer model in "Attention is All You Need".

    Project mention: ElevenLabs Launches Voice Translation Tool to Break Down Language Barriers | news.ycombinator.com | 2023-10-10

    The transformer model was invented to attend to context over the entire sequence length. Look at how the original authors used the Transformer for NMT in the original Vaswani et al publication. https://github.com/jadore801120/attention-is-all-you-need-py...

  • machine_learning_examples

    A collection of machine learning examples and tutorials.

    Project mention: Doubt about numpy's eigen calculation | /r/learnmachinelearning | 2023-05-25

    Does that mean that the example I found on the internet is wrong (I think it comes from a DL Course, so I'd imagine it is not wrong)? or does it mean that I am comparing two different things? I guess this has to deal with right and left eigen vectors as u/JanneJM pointed out in her comment?

  • Onboard AI

    ChatGPT with full context of any GitHub repo. Onboard AI learns any GitHub repo in minutes and lets you chat with it to locate functionality, understand different parts, and generate new code. Use it for free at app.getonboardai.com.

NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020). The latest post mention was on 2024-02-15.

Python Natural Language Processing related posts

Index

What are some of the best open-source Natural Language Processing projects in Python? This list will help you:

Project Stars
1 transformers 120,631
2 funNLP 61,740
3 bert 36,454
4 Jieba 31,855
5 HanLP 31,583
6 spaCy 28,280
7 NLP-progress 22,167
8 d2l-en 20,882
9 datasets 18,064
10 rasa 17,624
11 Ciphey 15,352
12 gensim 15,010
13 DocsGPT 13,867
14 flair 13,423
15 NLTK 12,803
16 MOSS 11,736
17 ludwig 10,551
18 Qwen 9,195
19 TextBlob 8,847
20 doccano 8,744
21 Pattern 8,629
22 attention-is-all-you-need-pytorch 8,241
23 machine_learning_examples 7,964
ChatGPT with full context of any GitHub repo.
Onboard AI learns any GitHub repo in minutes and lets you chat with it to locate functionality, understand different parts, and generate new code. Use it for free at app.getonboardai.com.
app.getonboardai.com