The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning. Learn more →
Top 23 Python NLP Projects
-
Thanks! :) I'm pushing them into transformers, pytorch-gemma and collabing with the Gemma team to resolve all the issues :)
The RoPE fix should already be in transformers 4.38.2: https://github.com/huggingface/transformers/pull/29285
My main PR for transformers which fixes most of the issues (some still left): https://github.com/huggingface/transformers/pull/29402
-
Project mention: OpenAI – Application for US trademark "GPT" has failed | news.ycombinator.com | 2024-02-15
task-specific parameters, and is trained on the downstream tasks by simply fine-tuning all pre-trained parameters.
-
WorkOS
The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
-
HanLP
中文分词 词性标注 命名实体识别 依存句法分析 成分句法分析 语义依存分析 语义角色标注 指代消解 风格转换 语义相似度 新词发现 关键词短语提取 自动摘要 文本分类聚类 拼音简繁转换 自然语言处理
-
SpaCy: An open-source library providing tools for advanced NLP tasks like tokenization, entity recognition, and part-of-speech tagging.
-
datasets
🤗 The largest hub of ready-to-use datasets for ML models with fast, easy-to-use and efficient data manipulation tools
Project mention: 🐍🐍 23 issues to grow yourself as an exceptional open-source Python expert 🧑💻 🥇 | dev.to | 2023-10-19 -
Project mention: The Era of 1-bit LLMs: ternary parameters for cost-effective computing | news.ycombinator.com | 2024-02-28
+1 On this, the real proof would have been testing both models side-by-side.
It seems that it may be published on GitHub [1] according to HuggingFace [2].
-
rasa
💬 Open source machine learning framework to automate text- and voice-based conversations: NLU, dialogue management, connect to Slack, Facebook, and more - Create chatbots and voice assistants
Project mention: 🔥🚀 Top 10 Open-Source Must-Have Tools for Crafting Your Own Chatbot 🤖💬 | dev.to | 2023-11-06Support Rasa on GitHub ⭐
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
-
I'd like to share with you today the Chinese-Alpaca-Plus-13B-GPTQ model, which is the GPTQ format quantised 4bit models of Yiming Cui's Chinese-LLaMA-Alpaca 13B for GPU reference.
-
-
-
-
haystack
:mag: LLM orchestration framework to build customizable, production-ready LLM applications. Connect components (models, vector DBs, file converters) to pipelines or agents that can interact with your data. With advanced retrieval methods, it's best suited for building RAG, question answering, semantic search or conversational agent chatbots.
4. Haystack by Deepset | Github | tutorial
-
alternatively, could we not simply split by common characters such as newlines and periods, to split it within sentences? it would be fragile with special handling required for numbers with decimal points and probably various other edge cases, though.
there are also Python libraries meant for natural language parsing[0] that could do that task for us. I even see examples on stack overflow[1] that simply split text into sentences.
-
PaddleHub
Awesome pre-trained models toolkit based on PaddlePaddle. (400+ models including Image, Text, Audio, Video and Cross-Modal with Easy Inference & Serving)
-
PaddleNLP
👑 Easy-to-use and powerful NLP and LLM library with 🤗 Awesome model zoo, supporting wide-range of NLP tasks from research to industrial applications, including 🗂Text Classification, 🔍 Neural Search, ❓ Question Answering, ℹ️ Information Extraction, 📄 Document Intelligence, 💌 Sentiment Analysis etc.
-
TextBlob
Simple, Pythonic, text processing--Sentiment analysis, part-of-speech tagging, noun phrase extraction, translation, and more.
TextBlob is a Python toolkit for text processing. It offers some common NLP functionalities such as part-of-speech tagging and noun phrase extraction. We’ll use TextBlob in our project to perform some quick sentiment analysis on tweets.
-
petals
🌸 Run LLMs at home, BitTorrent-style. Fine-tuning and inference up to 10x faster than offloading
So how long until we can do an open source Mistral Large?
We could make a start on Petals or some other open source distributed training network cluster possibly?
-
attention-is-all-you-need-pytorch
A PyTorch implementation of the Transformer model in "Attention is All You Need".
Project mention: ElevenLabs Launches Voice Translation Tool to Break Down Language Barriers | news.ycombinator.com | 2023-10-10The transformer model was invented to attend to context over the entire sequence length. Look at how the original authors used the Transformer for NMT in the original Vaswani et al publication. https://github.com/jadore801120/attention-is-all-you-need-py...
-
-
-
Stanza
Stanford NLP Python library for tokenization, sentence segmentation, NER, and parsing of many human languages
-
txtai
💡 All-in-one open-source embeddings database for semantic search, LLM orchestration and language model workflows
txtai is an all-in-one embeddings database for semantic search, LLM orchestration and language model workflows.
-
It's indeed suspicious. You're sending your voice samples, your various services accounts, your location and more private data to some proprietary black box in some public cloud. Sorry, but this is a privacy nightmare. It should be open source and self-hosted like Mycroft (https://mycroft.ai) or Leon (https://getleon.ai) to be trustworthy.
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
Python NLP related posts
- Ship Faster by Organising Less
- Gemma doesn't suck anymore – 8 bug fixes
- Splade: Sparse Neural Search
- Excel Anonymizer-A Python script to anonymize data in Excel files
- The Era of 1-bit LLMs: ternary parameters for cost-effective computing
- Build knowledge graphs with LLM-driven entity extraction
- LlamaCloud and LlamaParse
-
A note from our sponsor - WorkOS
workos.com | 18 Mar 2024
Index
What are some of the best open-source NLP projects in Python? This list will help you:
Project | Stars | |
---|---|---|
1 | transformers | 122,103 |
2 | bert | 36,638 |
3 | HanLP | 31,871 |
4 | spaCy | 28,455 |
5 | datasets | 18,183 |
6 | unilm | 17,772 |
7 | rasa | 17,732 |
8 | Chinese-LLaMA-Alpaca | 16,704 |
9 | best-of-ml-python | 15,178 |
10 | gensim | 15,074 |
11 | flair | 13,465 |
12 | haystack | 13,084 |
13 | NLTK | 12,907 |
14 | PaddleHub | 12,435 |
15 | PaddleNLP | 11,169 |
16 | TextBlob | 8,877 |
17 | petals | 8,493 |
18 | attention-is-all-you-need-pytorch | 8,328 |
19 | GPT2-Chinese | 7,321 |
20 | text-generation-inference | 7,240 |
21 | Stanza | 6,994 |
22 | txtai | 6,681 |
23 | mycroft-core | 6,429 |