Top 23 Natural Language Processing Open-Source Projects
-
Project mention: [D] The current and future state of AI/ML is shockingly demoralizing with little hope of redemption | reddit.com/r/MachineLearning | 2022-08-07
pip install git+https://github.com/huggingface/transformers
-
funNLP
中英文敏感词、语言检测、中外手机/电话归属地/运营商查询、名字推断性别、手机号抽取、身份证抽取、邮箱抽取、中日文人名库、中文缩写库、拆字词典、词汇情感值、停用词、反动词表、暴恐词表、繁简体转换、英文模拟中文发音、汪峰歌词生成器、职业名称词库、同义词库、反义词库、否定词库、汽车品牌词库、汽车零件词库、连续英文切割、各种中文词向量、公司名字大全、古诗词库、IT词库、财经词库、成语词库、地名词库、历史名人词库、诗词词库、医学词库、饮食词库、法律词库、汽车词库、动物词库、中文聊天语料、中文谣言数据、百度中文问答数据集、句子相似度匹配算法集合、bert资源、文本生成&摘要相关工具、cocoNLP信息抽取工具、国内电话号码正则匹配、清华大学XLORE:中英文跨语言百科知识图谱、清华大学人工智能技术系列报告、自然语言生成、NLU太难了系列、自动对联数据及机器人、用户名黑名单列表、罪名法务名词及分类模型、微信公众号语料、cs224n深度学习自然语言处理课程、中文手写汉字识别、中文自然语言处理 语料/数据集、变量命名神器、分词语料库+代码、任务型对话英文数据集、ASR 语音数据集 + 基于深度学习的中文
-
Scout APM
Less time debugging, more time building. Scout APM allows you to find and fix performance issues with no hassle. Now with error monitoring and external services monitoring, Scout is a developer's best friend when it comes to application development.
-
Project mention: Improved Content Understanding and Relevance with Large Language Models (SnooBERT ) | reddit.com/r/RedditEng | 2022-07-07
BERT stands for Bidirectional Encoder Representations from Transformers. It is designed to pre-train deep bidirectional representations from unlabeled text by jointly conditioning on both left and right contexts in all layers. It generates state-of-the-art numerical representations that are useful for common language understanding tasks. You can find more details in the paper BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. BERT is used today for popular Natural Language tasks like question answering, text prediction, text generation, summarization, and power applications like Google search.
-
PS... +1 on Made With ML, the Hugging Face course is great, and I've heard to a ton of good things about MLOps Zoomcamp
-
Project mention: Where can I download a database of Chinese word classifications (noun, verb, etc) | reddit.com/r/ChineseLanguage | 2022-03-28
-
HanLP
Natural Language Processing for the next decade. Tokenization, Part-of-Speech Tagging, Named Entity Recognition, Syntactic & Semantic Dependency Parsing, Document Classification
Project mention: Hanlp - Natural language processing for the next decade | reddit.com/r/github_trends | 2022-05-28 -
Given your need, I think you'll be better off with libraries like Spacy, which does NLP (rather than just DNN inference). You'll get your app much faster this way.
-
SonarLint
Clean code begins in your IDE with SonarLint. Up your coding game and discover issues early. SonarLint is a free plugin that helps you find & fix bugs and security issues from the moment you start writing code. Install from your favorite IDE marketplace today.
-
NLP-progress
Repository to track the progress in Natural Language Processing (NLP), including the datasets and the current state-of-the-art for the most common NLP tasks.
Project mention: [D] How difficult/easy is to learn NLP once you have experience in a CV? | reddit.com/r/MachineLearning | 2021-12-13One thing is that NLP is a set of wildly different problems which share some aspects, but often use quite different techniques and assumptions about their datasets. So even if you would have NLP experience, if you'd need to start on a substantially different NLP task, you can't just apply what you know and succeed, you have to review "how things are done" for that problem domain. For a quick overview, sites like https://nlpprogress.com/ can be helpful to see what methods are used; and, perhaps even more importantly, how people are modeling the actual task.
-
applied-ml
📚 Papers & tech blogs by companies sharing their work on data science & machine learning in production.
The second repo I LOVE is Eugene Yan’s Applied ML repository. This is a brilliant idea to create and actually something I was planning on sort of casually doing in my non-existent free time… Anyhow, it is a curated list of technical posts from top engineering teams (Netflix, Amazon, Pinterest, Linkedin, etc.) detailing how they built out different types of AI/ML systems (e.g. forecasting, recommenders, search and ranking, etc.). Ofc, it focuses on AI/ML, but something similar could be made for the traditional or BI-oriented analytics stack, as well as the streaming world, super high value for practitioners! Btw-one of my favorite things at BCG used to be looking at our IT architecture team’s reference architecture diagrams… the best way to understand technologies is to look at how a ton of stuff is architected… and its fun!
-
rasa
💬 Open source machine learning framework to automate text- and voice-based conversations: NLU, dialogue management, connect to Slack, Facebook, and more - Create chatbots and voice assistants
Check out Rasa
-
d2l-en
Interactive deep learning book with multi-framework code, math, and discussions. Adopted at 300 universities from 55 countries including Stanford, MIT, Harvard, and Cambridge.
Project mention: How to pre-train BERT on different objective tasks using HuggingFace | reddit.com/r/deeplearning | 2022-04-10There might is bert library for pre-train bert model in huggingface, But I suggestion that you train bert model in native pytorch to understand detail, Limu's course is recommended for you
-
datasets
🤗 The largest hub of ready-to-use datasets for ML models with fast, easy-to-use and efficient data manipulation tools
Project mention: FauxPilot – an open-source GitHub Copilot server | news.ycombinator.com | 2022-08-02And then pass that my_code.json as the dataset name.
-
-
Project mention: sentence transformer vector dimensionality reduction to 1 | reddit.com/r/LanguageTechnology | 2022-08-01
-
Awesome-pytorch-list
A comprehensive list of pytorch related content on github,such as different models,implementations,helper libraries,tutorials etc.
Project mention: Similar open source long library list to TF like Pytorch "ECOSYSTEM TOOLS" | reddit.com/r/tensorflow | 2021-11-19I got the following as recombination from elsewhere - https://github.com/jtoy/awesome-tensorflow and there is one for pt as well https://github.com/bharathgs/Awesome-pytorch-list . Thx for the help :D
-
Project mention: Flair: A simple framework for state-of-the-art Natural Language Processing | news.ycombinator.com | 2022-04-11
-
-
-
Project mention: Best models for sentence similarity with good benefit-cost ratio? | reddit.com/r/MLQuestions | 2022-08-08
you could try Jina.ai's CLIP-as-a-Service: https://github.com/jina-ai/clip-as-service
-
Ciphey
⚡ Automatically decrypt encryptions without knowing the key or cipher, decode encodings, and crack hashes ⚡
I followed the steps here . I am running Python 3.10 (64). When I try to install Ciphey using the instructions, on my cmd prompt I get the following:
-
deep-learning-drizzle
Drench yourself in Deep Learning, Reinforcement Learning, Machine Learning, Computer Vision, and NLP by learning from these exciting lectures!!
Project mention: Consolidated Video lectures for Machine Learning(including DL, CV, NLP, etc) | reddit.com/r/developersIndia | 2022-01-22Also this as well for whoever needs it
-
Project mention: Site that tracks player mentions in /r/FantasyPL (with sentiment) | reddit.com/r/FantasyPL | 2022-07-24
I used this one: https://www.npmjs.com/package/natural
-
Project mention: How to use CoreNLP with a large corpus(14.7 GB)? | reddit.com/r/LanguageTechnology | 2022-08-06
If you need further assistance, you will be better off making an issue on their github: https://github.com/stanfordnlp/CoreNLP
Natural Language Processing related posts
- Where do I start to learn MLOPS?
- have any of you tried the GPT J 6B PNY AI modal with KoblodAI?
- using bert for relation extraction
- Grammarly Alternatives?
- How to get started with machine learning.
- Anyone using Elasticsearch for text / comment spam detection?
- Can you come up with cool name for my channel?
Index
What are some of the best open-source Natural Language Processing projects? This list will help you:
Project | Stars | |
---|---|---|
1 | transformers | 68,093 |
2 | funNLP | 42,262 |
3 | bert | 31,863 |
4 | Made-With-ML | 30,534 |
5 | Jieba | 29,046 |
6 | HanLP | 26,580 |
7 | spaCy | 23,929 |
8 | NLP-progress | 20,738 |
9 | applied-ml | 20,615 |
10 | rasa | 14,661 |
11 | d2l-en | 14,476 |
12 | datasets | 13,902 |
13 | awesome-nlp | 13,525 |
14 | gensim | 13,430 |
15 | Awesome-pytorch-list | 13,282 |
16 | flair | 11,922 |
17 | allennlp | 11,142 |
18 | NLTK | 10,955 |
19 | clip-as-service | 10,557 |
20 | Ciphey | 10,464 |
21 | deep-learning-drizzle | 10,279 |
22 | natural | 9,868 |
23 | CoreNLP | 8,588 |
Are you hiring? Post a new remote job listing for free.