SaaSHub helps you find the best software and product alternatives Learn more →
Top 22 Python nlp-library Projects
-
While its tough to say something specifc since we dont know how exactly you trained it or the prompt format of your training input or how you are performing inference, one thing I found when I faced similar types of issues is that the model does not know when to stop. Some of it is because the fast llama tokenizer does not add the token when encoding your inputs. So you can either add that token explicitly in your input text for each sample or use the slow llama tokenizer. Check llama_recipes github repo for the exact issue https://github.com/huggingface/transformers/issues/22794. The other most probable thing you might want to check is if the model.generate output contains the exact input tokens too. That is the expected behavior of some models (like llama2 or mpt) for example when you use vanilla transformers for inference.
-
Project mention: A beginner’s guide to sentiment analysis using OceanBase and spaCy | dev.to | 2023-10-25
In this article, I'm going to walk through a sentiment analysis project from start to finish, using open-source Amazon product reviews. However, using the same approach, you can easily implement mass sentiment analysis on your own products. We'll explore an approach to sentiment analysis with one of the most popular Python NLP packages: spaCy.
-
Onboard AI
Learn any GitHub repo in 59 seconds. Onboard AI learns any GitHub repo in minutes and lets you chat with it to locate functionality, understand different parts, and generate new code. Use it for free at www.getonboard.dev.
-
-
FARM
:house_with_garden: Fast & easy transfer learning for NLP. Harvesting language models for the industry. Focus on Question Answering.
-
tika-python
Tika-Python is a Python binding to the Apache Tika™ REST services allowing Tika to be called natively in the Python community.
-
contextualized-topic-models
A python package to run contextualized topic modeling. CTMs combine contextualized embeddings (e.g., BERT) with topic models to get coherent topics. Published at EACL and ACL 2021.
Project mention: [Project]Topic modelling of tweets from the same user | /r/MachineLearning | 2023-04-14In our experiments, CTM works well with tweets: https://github.com/MilaNLProc/contextualized-topic-models (I'm one of the authors)
-
Thanks for pointing me in the right direction. Seems like there’s a few other approaches with weak supervision: https://github.com/NorskRegnesentral/skweak
-
InfluxDB
Collect and Analyze Billions of Data Points in Real Time. Manage all types of time series data in a single, purpose-built database. Run at any scale in any environment in the cloud, on-premises, or at the edge.
-
-
-
OCTIS
OCTIS: Comparing Topic Models is Simple! A python package to optimize and evaluate topic models (accepted at EACL2021 demo track)
-
camel_tools
A suite of Arabic natural language processing tools developed by the CAMeL Lab at New York University Abu Dhabi.
Otherwise it depends on your use case. There are NLP libraries like this one that can do the job.
-
Project mention: A transformer-based method for zero and few-shot biomedical NER | news.ycombinator.com | 2023-05-12
-
-
-
-
NLP-Guide
Natural Language Processing (NLP). Covering topics such as Tokenization, Part Of Speech tagging (POS), Machine translation, Named Entity Recognition (NER), Classification, and Sentiment analysis.
-
-
taxonomy4good
Taxonomy4Good: a sustainability lexicon that provides the freedom to create custom taxonomies in addition to listed ESG and Sustainability Standards taxonomies.
-
Semi-Automated-Youtube-Channel
Semi automated youtube channel that has a lot of cool features for someone to use in their content generating project
-
breame
Lightweight utility tools for the detection of multiple spellings, meanings, and language-specific terminology in British and American English
-
Project mention: [P] MultiEL: Multilingual Entity Linking model by BELA model | /r/MachineLearning | 2023-06-29
-
Project mention: Seeking your insights on "Loquax": A tool for phonological analysis | /r/latin | 2023-05-30
Lovely - thanks so much for the feedback u/christmas_fan1 - it means a lot. I've created an issue with it linking back to your original comment: https://github.com/mattlianje/loquax/issues/11
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
Python nlp-library related posts
- [P] MultiEL: Multilingual Entity Linking model by BELA model
- YouTube content creation assistant
- Seeking your insights on "Loquax": A tool for phonological analysis
- I used GPT-4 to create code that automates absolutely everything in creating YouTube Shorts, from voiceover to editing, even down to choosing the illustration images.
- [Arabic>latin transliteration] any apps for this?
- [P] Programmatic: Powerful Weak Labeling
- Show HN: Programmatic – a REPL for creating labeled data
-
A note from our sponsor - #<SponsorshipServiceOld:0x00007f0f9b7c9410>
www.saashub.com | 5 Dec 2023
Index
What are some of the best open-source nlp-library projects in Python? This list will help you:
Project | Stars | |
---|---|---|
1 | transformers | 116,187 |
2 | spaCy | 27,703 |
3 | OpenPrompt | 3,931 |
4 | FARM | 1,699 |
5 | tika-python | 1,363 |
6 | contextualized-topic-models | 1,124 |
7 | skweak | 899 |
8 | pythainlp | 892 |
9 | janome | 801 |
10 | OCTIS | 627 |
11 | camel_tools | 355 |
12 | zshot | 284 |
13 | mutate | 146 |
14 | turkish-deasciifier | 138 |
15 | toiro | 110 |
16 | NLP-Guide | 60 |
17 | rakun2 | 56 |
18 | taxonomy4good | 20 |
19 | Semi-Automated-Youtube-Channel | 13 |
20 | breame | 9 |
21 | MultiEL | 7 |
22 | loquax | 2 |