Python Natural Language Processing

Open-source Python projects categorized as Natural Language Processing | Edit details

Top 23 Python Natural Language Processing Projects

  • GitHub repo transformers

    🤗 Transformers: State-of-the-art Natural Language Processing for Pytorch, TensorFlow, and JAX.

    Project mention: [D] How do pretrained tokenizers work? | reddit.com/r/MachineLearning | 2021-11-26

    I have been using the pretrained tokenizers available from the huggingface/transformers library. And they have been working well for my use case.

  • GitHub repo funNLP

    中英文敏感词、语言检测、中外手机/电话归属地/运营商查询、名字推断性别、手机号抽取、身份证抽取、邮箱抽取、中日文人名库、中文缩写库、拆字词典、词汇情感值、停用词、反动词表、暴恐词表、繁简体转换、英文模拟中文发音、汪峰歌词生成器、职业名称词库、同义词库、反义词库、否定词库、汽车品牌词库、汽车零件词库、连续英文切割、各种中文词向量、公司名字大全、古诗词库、IT词库、财经词库、成语词库、地名词库、历史名人词库、诗词词库、医学词库、饮食词库、法律词库、汽车词库、动物词库、中文聊天语料、中文谣言数据、百度中文问答数据集、句子相似度匹配算法集合、bert资源、文本生成&摘要相关工具、cocoNLP信息抽取工具、国内电话号码正则匹配、清华大学XLORE:中英文跨语言百科知识图谱、清华大学人工智能技术系列报告、自然语言生成、NLU太难了系列、自动对联数据及机器人、用户名黑名单列表、罪名法务名词及分类模型、微信公众号语料、cs224n深度学习自然语言处理课程、中文手写汉字识别、中文自然语言处理 语料/数据集、变量命名神器、分词语料库+代码、任务型对话英文数据集、ASR 语音数据集 + 基于深度学习的中文

  • Scout APM

    Scout APM: A developer's best friend. Try free for 14-days. Scout APM uses tracing logic that ties bottlenecks to source code so you know the exact line of code causing performance issues and can get back to building a great product faster.

  • GitHub repo bert

    TensorFlow code and pre-trained models for BERT

    Project mention: Is huggingface pre-trained models on their site can be used for commercial use? | reddit.com/r/LanguageTechnology | 2021-09-20

    But I can see that the LICENSE over on Google's bert repository comes with an Apache License, meaning we can generally use it - and amusingly if you scroll down to the 'date' and 'author', it says:

  • GitHub repo Jieba

    结巴中文分词

    Project mention: Learn vocabulary effortlessly while browsing the web [FR,EN,DE,PT,ES] | reddit.com/r/languagelearning | 2021-03-23

    Since you're saying the main issue is segmentation, there are libraries to help out with that issue. jieba is fantastic if you have a Python backend, nodejieba (50k downloads/week) if it's more JS-side.

  • GitHub repo spaCy

    💫 Industrial-strength Natural Language Processing (NLP) in Python

    Project mention: Two Methods to Scan for PII in Data Warehouses | dev.to | 2021-11-29

    NLP libraries such as Stanford NER Detector and Spacy

  • GitHub repo NLP-progress

    Repository to track the progress in Natural Language Processing (NLP), including the datasets and the current state-of-the-art for the most common NLP tasks.

    Project mention: Upcoming App Announcement: Lemmatize, a Foreign Language Reader | reddit.com/r/languagelearning | 2021-11-11

    A standard step in Chinese text processing is word segmentation, which deals with this problem.

  • GitHub repo rasa

    💬 Open source machine learning framework to automate text- and voice-based conversations: NLU, dialogue management, connect to Slack, Facebook, and more - Create chatbots and voice assistants

    Project mention: How to Create the Perfect README for Your Open Source Project | dev.to | 2021-11-02

    This example is sourced from RasaHQ

  • Nanos

    Run Linux Software Faster and Safer than Linux with Unikernels.

  • GitHub repo gensim

    Topic Modelling for Humans

    Project mention: Gensim – a Python library for topic modelling, document indexing | news.ycombinator.com | 2021-11-25
  • GitHub repo d2l-en

    Interactive deep learning book with multi-framework code, math, and discussions. Adopted at 300 universities from 55 countries including Stanford, MIT, Harvard, and Cambridge.

    Project mention: I created a way to learn machine learning through Jupyter | reddit.com/r/learnmachinelearning | 2021-04-30

    There are actually some online books and courses built on Jupyter Notebook ([Dive to Deep Learning Book](https://github.com/d2l-ai/d2l-en) for example). However yours is more detail and could really helps beginners.

  • GitHub repo datasets

    🤗 The largest hub of ready-to-use datasets for ML models with fast, easy-to-use and efficient data manipulation tools

    Project mention: Hugging Face Introduces ‘Datasets’: A Lightweight Community Library For Natural Language Processing (NLP) | reddit.com/r/artificial | 2021-11-08

    Code for https://arxiv.org/abs/2109.02846 found: https://github.com/huggingface/datasets

  • GitHub repo flair

    A very simple framework for state-of-the-art Natural Language Processing (NLP)

    Project mention: How to create a dataset for training NER models when you only have entity data | reddit.com/r/LanguageTechnology | 2021-10-18

    We have a list of entities in text files separated with a new line. We intend to train the flair model to detect these entities in text, but NER models require the entity to be labeled in a paragraph with BOI format.

  • GitHub repo allennlp

    An open-source NLP research library, built on PyTorch.

    Project mention: Cedille, the largest French language model, open source with a freely accessible playground | reddit.com/r/GPT3 | 2021-11-12
  • GitHub repo NLTK

    NLTK Source

    Project mention: Count words within strings | reddit.com/r/tableau | 2021-10-27
  • GitHub repo bert-as-service

    Mapping a variable-length sentence to a fixed-length vector using BERT model

    Project mention: Needed 100% to pass a safety quiz, need to wait a week to retake | reddit.com/r/mildlyinfuriating | 2021-01-12

    You joke but

  • GitHub repo Ciphey

    ⚡ Automatically decrypt encryptions without knowing the key or cipher, decode encodings, and crack hashes ⚡

    Project mention: Tips for Making a Popular Open-Source Project in 2021 [Ultimate Guide] | news.ycombinator.com | 2021-11-12
  • GitHub repo Pattern

    Web mining module for Python, with tools for scraping, natural language processing, machine learning, network analysis and visualization.

    Project mention: Discussion Thread | reddit.com/r/neoliberal | 2021-08-27

    if you're curious about the nitty gritty, the parsing module's documentation is well written and doesn't require a comp sci or linguistics degree to get the gist of what's happening.

  • GitHub repo ludwig

    Data-centric declarative deep learning framework

    Project mention: Most Frequent 600 Coding Questions on LeetCode | reddit.com/r/cscareerquestions | 2021-10-26

    They list themselves all over the internet as an "open source contributor" to Uber, which as far I can tell is based entirely on... reporting that there was an issue with a favicon. To me, it seems like they'll be cheating anybody who employs them based on this, ahem, "experience". And that feels like the tip of the iceberg.

  • GitHub repo TextBlob

    Simple, Pythonic, text processing--Sentiment analysis, part-of-speech tagging, noun phrase extraction, translation, and more.

    Project mention: Any way for Python to interpret words? | reddit.com/r/learnpython | 2021-07-27

    Check out TextBlob — Spelling correction, splitting text into words and sentences, sentiment analysis, part-of-speech tagging, noun phrase extraction, translation, and more

  • GitHub repo machine_learning_examples

    A collection of machine learning examples and tutorials.

    Project mention: How to save an attention model for deployment/exposing to an API? | reddit.com/r/deeplearning | 2021-08-17

    I've been following a course teaching how to make an attention model for neural machine translation, This is the file inside the repo. I know that I'll have to use certain functions to make the textual input be processed in encodings and tokens, but those functions use certain instances of the model, which I don't know if I should keep or not. If anyone can please take a look and help me out here, it'd be really really appreciated.

  • GitHub repo pytext

    A natural language modeling framework based on PyTorch

  • GitHub repo attention-is-all-you-need-pytorch

    A PyTorch implementation of the Transformer model in "Attention is All You Need".

    Project mention: Lack of activation in transformer feedforward layer? | reddit.com/r/learnmachinelearning | 2021-05-20

    I'm curious as to why the second matrix multiplication is not followed by an activation unlike the first one. Is there any particular reason why a non-linearity would be trivial or even avoided in the second operation? For reference, variations of this can be witnessed in a number of different implementations, including BERT-pytorch and attention-is-all-you-need-pytorch.

  • GitHub repo Stanza

    Official Stanford NLP Python Library for Many Human Languages

    Project mention: Spacy vs NLTK for Spanish Language Statistical Tasks | reddit.com/r/LanguageTechnology | 2021-11-12
  • GitHub repo nlp-recipes

    Natural Language Processing Best Practices & Examples

    Project mention: Is there any utility software/bot that produces descriptor tags for a Reddit image post using the comments? | reddit.com/r/redditdev | 2021-11-07

    I found this (https://github.com/microsoft/nlp-recipes) resource and it has a list of pre-built or easily customizable NLP models that I'm going to try out.

NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020). The latest post mention was on 2021-11-29.

Python Natural Language Processing related posts

Index

What are some of the best open-source Natural Language Processing projects in Python? This list will help you:

Project Stars
1 transformers 54,896
2 funNLP 35,132
3 bert 29,796
4 Jieba 27,420
5 spaCy 21,827
6 NLP-progress 19,419
7 rasa 13,108
8 gensim 12,694
9 d2l-en 11,569
10 datasets 11,380
11 flair 10,994
12 allennlp 10,639
13 NLTK 10,269
14 bert-as-service 9,747
15 Ciphey 9,034
16 Pattern 8,096
17 ludwig 7,996
18 TextBlob 7,960
19 machine_learning_examples 6,439
20 pytext 6,267
21 attention-is-all-you-need-pytorch 5,980
22 Stanza 5,846
23 nlp-recipes 5,761
Find remote jobs at our new job board 99remotejobs.com. There are 33 new remote jobs listed recently.
Are you hiring? Post a new remote job listing for free.
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com