Top 23 Python Natural Language Processing Projects
🤗 Transformers: State-of-the-art Natural Language Processing for Pytorch, TensorFlow, and JAX.Project mention: Export and run other machine learning models | dev.to | 2021-10-14
中英文敏感词、语言检测、中外手机/电话归属地/运营商查询、名字推断性别、手机号抽取、身份证抽取、邮箱抽取、中日文人名库、中文缩写库、拆字词典、词汇情感值、停用词、反动词表、暴恐词表、繁简体转换、英文模拟中文发音、汪峰歌词生成器、职业名称词库、同义词库、反义词库、否定词库、汽车品牌词库、汽车零件词库、连续英文切割、各种中文词向量、公司名字大全、古诗词库、IT词库、财经词库、成语词库、地名词库、历史名人词库、诗词词库、医学词库、饮食词库、法律词库、汽车词库、动物词库、中文聊天语料、中文谣言数据、百度中文问答数据集、句子相似度匹配算法集合、bert资源、文本生成&摘要相关工具、cocoNLP信息抽取工具、国内电话号码正则匹配、清华大学XLORE:中英文跨语言百科知识图谱、清华大学人工智能技术系列报告、自然语言生成、NLU太难了系列、自动对联数据及机器人、用户名黑名单列表、罪名法务名词及分类模型、微信公众号语料、cs224n深度学习自然语言处理课程、中文手写汉字识别、中文自然语言处理 语料/数据集、变量命名神器、分词语料库+代码、任务型对话英文数据集、ASR 语音数据集 + 基于深度学习的中文
Optimize your datasets for ML. Goodbye, boilerplate code - the fastest dataset optimization and management tool for computer vision.
TensorFlow code and pre-trained models for BERTProject mention: Is huggingface pre-trained models on their site can be used for commercial use? | reddit.com/r/LanguageTechnology | 2021-09-20
But I can see that the LICENSE over on Google's bert repository comes with an Apache License, meaning we can generally use it - and amusingly if you scroll down to the 'date' and 'author', it says:
结巴中文分词Project mention: Learn vocabulary effortlessly while browsing the web [FR,EN,DE,PT,ES] | reddit.com/r/languagelearning | 2021-03-23
Since you're saying the main issue is segmentation, there are libraries to help out with that issue. jieba is fantastic if you have a Python backend, nodejieba (50k downloads/week) if it's more JS-side.
💫 Industrial-strength Natural Language Processing (NLP) in PythonProject mention: I put together a tutorial and overview on how to use DeepSpeech to do Speech Recognition in Python | reddit.com/r/Python | 2021-10-14
It definitely could - with the real-time speech recognition example shown in the tutorial. But you'd likely need some sort of NLU running after the transcription is performed - to basically parse what was spoken into a command that you can use to run some business logic. There are some good open source libs for this too like https://spacy.io/
Repository to track the progress in Natural Language Processing (NLP), including the datasets and the current state-of-the-art for the most common NLP tasks.Project mention: [P] NLP "tl;dr" Notes on Transformers | reddit.com/r/MachineLearning | 2021-08-12
It would also be cool to have some charts with parameter density and even overall effectiveness (a tl;dr version of SOTA-trackers, maybe?) if that doesn't prove too infeasible.
💬 Open source machine learning framework to automate text- and voice-based conversations: NLU, dialogue management, connect to Slack, Facebook, and more - Create chatbots and voice assistantsProject mention: Building a chatbot - How should I approach this? | reddit.com/r/learnpython | 2021-08-12
Like u/Hungry_Check_9153 says, about your image of chabots working, I recommend looking at rasa which is an open source python chatbot. To give yourself an idea of the sheer scope of such a project, take a look at their github. Building a chatbot using Rasa, may be a good first step and offers plenty of experience writing and learning python code.
Run Linux Software Faster and Safer than Linux with Unikernels.
Topic Modelling for HumansProject mention: The unthinking application of this regex-efficiency check wasted our attention | news.ycombinator.com | 2021-09-30
Interactive deep learning book with multi-framework code, math, and discussions. Adopted at 200 universities.Project mention: I created a way to learn machine learning through Jupyter | reddit.com/r/learnmachinelearning | 2021-04-30
There are actually some online books and courses built on Jupyter Notebook ([Dive to Deep Learning Book](https://github.com/d2l-ai/d2l-en) for example). However yours is more detail and could really helps beginners.
A very simple framework for state-of-the-art Natural Language Processing (NLP)Project mention: Preparing data for training NER models | reddit.com/r/LanguageTechnology | 2021-10-11
Training most of the Named Entity Recognition (NER) models for example Flair usually needs to format data in BOI tagging) scheme as shown below where each sentence is separated by blank line
An open-source NLP research library, built on PyTorch.Project mention: Any allennlp users in this sub? | reddit.com/r/LanguageTechnology | 2021-10-08
https://github.com/allenai/allennlp/discussions looks active
NLTK SourceProject mention: Top 10 Python Libraries for Machine Learning | dev.to | 2021-09-09
Website: https://www.nltk.org/ Github Repository:https://github.com/nltk/nltk Developed By: Team NLTK Primary Purpose: Natural Language Processing
🤗 The largest hub of ready-to-use datasets for ML models with fast, easy-to-use and efficient data manipulation toolsProject mention: Datasets: A Community Library for Natural Language Processing | news.ycombinator.com | 2021-09-08
Mapping a variable-length sentence to a fixed-length vector using BERT modelProject mention: Needed 100% to pass a safety quiz, need to wait a week to retake | reddit.com/r/mildlyinfuriating | 2021-01-12
You joke but
⚡ Automatically decrypt encryptions without knowing the key or cipher, decode encodings, and crack hashes ⚡Project mention: Awesome Penetration Testing | dev.to | 2021-10-06
Ciphey - Automated decryption tool using artificial intelligence and natural language processing.
Web mining module for Python, with tools for scraping, natural language processing, machine learning, network analysis and visualization.Project mention: Discussion Thread | reddit.com/r/neoliberal | 2021-08-27
if you're curious about the nitty gritty, the parsing module's documentation is well written and doesn't require a comp sci or linguistics degree to get the gist of what's happening.
Simple, Pythonic, text processing--Sentiment analysis, part-of-speech tagging, noun phrase extraction, translation, and more.Project mention: Any way for Python to interpret words? | reddit.com/r/learnpython | 2021-07-27
Check out TextBlob — Spelling correction, splitting text into words and sentences, sentiment analysis, part-of-speech tagging, noun phrase extraction, translation, and more
A collection of machine learning examples and tutorials.Project mention: How to save an attention model for deployment/exposing to an API? | reddit.com/r/deeplearning | 2021-08-17
I've been following a course teaching how to make an attention model for neural machine translation, This is the file inside the repo. I know that I'll have to use certain functions to make the textual input be processed in encodings and tokens, but those functions use certain instances of the model, which I don't know if I should keep or not. If anyone can please take a look and help me out here, it'd be really really appreciated.
A natural language modeling framework based on PyTorch
A PyTorch implementation of the Transformer model in "Attention is All You Need".Project mention: Lack of activation in transformer feedforward layer? | reddit.com/r/learnmachinelearning | 2021-05-20
I'm curious as to why the second matrix multiplication is not followed by an activation unlike the first one. Is there any particular reason why a non-linearity would be trivial or even avoided in the second operation? For reference, variations of this can be witnessed in a number of different implementations, including BERT-pytorch and attention-is-all-you-need-pytorch.
Official Stanford NLP Python Library for Many Human Languages
Natural Language Processing Best Practices & ExamplesProject mention: Building a Aspect based sentiment classification | reddit.com/r/LanguageTechnology | 2021-09-06
There is an NLP recipe from Microsoft on ABSA. Have you seen this? https://github.com/microsoft/nlp-recipes/blob/master/examples/sentiment_analysis/absa/absa.ipynb
pkuseg多领域中文分词工具; The pkuseg toolkit for multi-domain Chinese word segmentation
What are some of the best open-source Natural Language Processing projects in Python? This list will help you:
Are you hiring? Post a new remote job listing for free.