SaaSHub helps you find the best software and product alternatives Learn more →
Top 22 Python nlp-library Projects
🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.Project mention: Fine-Tuned Llama2 Inserting Unnecessary Delimiters | /r/LocalLLaMA | 2023-11-04
While its tough to say something specifc since we dont know how exactly you trained it or the prompt format of your training input or how you are performing inference, one thing I found when I faced similar types of issues is that the model does not know when to stop. Some of it is because the fast llama tokenizer does not add the token when encoding your inputs. So you can either add that token explicitly in your input text for each sample or use the slow llama tokenizer. Check llama_recipes github repo for the exact issue https://github.com/huggingface/transformers/issues/22794. The other most probable thing you might want to check is if the model.generate output contains the exact input tokens too. That is the expected behavior of some models (like llama2 or mpt) for example when you use vanilla transformers for inference.
💫 Industrial-strength Natural Language Processing (NLP) in PythonProject mention: A beginner’s guide to sentiment analysis using OceanBase and spaCy | dev.to | 2023-10-25
In this article, I'm going to walk through a sentiment analysis project from start to finish, using open-source Amazon product reviews. However, using the same approach, you can easily implement mass sentiment analysis on your own products. We'll explore an approach to sentiment analysis with one of the most popular Python NLP packages: spaCy.
Learn any GitHub repo in 59 seconds. Onboard AI learns any GitHub repo in minutes and lets you chat with it to locate functionality, understand different parts, and generate new code. Use it for free at www.getonboard.dev.
An Open-Source Framework for Prompt-Learning.
:house_with_garden: Fast & easy transfer learning for NLP. Harvesting language models for the industry. Focus on Question Answering.
Tika-Python is a Python binding to the Apache Tika™ REST services allowing Tika to be called natively in the Python community.
A python package to run contextualized topic modeling. CTMs combine contextualized embeddings (e.g., BERT) with topic models to get coherent topics. Published at EACL and ACL 2021.Project mention: [Project]Topic modelling of tweets from the same user | /r/MachineLearning | 2023-04-14
In our experiments, CTM works well with tweets: https://github.com/MilaNLProc/contextualized-topic-models (I'm one of the authors)
skweak: A software toolkit for weak supervision applied to NLP tasksProject mention: Entity Extraction with Predefined List | /r/LanguageTechnology | 2023-01-07
Thanks for pointing me in the right direction. Seems like there’s a few other approaches with weak supervision: https://github.com/NorskRegnesentral/skweak
Collect and Analyze Billions of Data Points in Real Time. Manage all types of time series data in a single, purpose-built database. Run at any scale in any environment in the cloud, on-premises, or at the edge.
Thai Natural Language Processing in Python.
Japanese morphological analysis engine written in pure PythonProject mention: [discussion] Open AI api translations | /r/Re_Zero | 2023-04-19
OCTIS: Comparing Topic Models is Simple! A python package to optimize and evaluate topic models (accepted at EACL2021 demo track)
A suite of Arabic natural language processing tools developed by the CAMeL Lab at New York University Abu Dhabi.Project mention: [Arabic>latin transliteration] any apps for this? | /r/translator | 2023-04-30
Otherwise it depends on your use case. There are NLP libraries like this one that can do the job.
Zero and Few shot named entity & relationships recognitionProject mention: A transformer-based method for zero and few-shot biomedical NER | news.ycombinator.com | 2023-05-12
A library to synthesize text datasets using Large Language Models (LLM)
Turkish deasciifier in Python based on Deniz Yüret's turkish-mode for Emacs
A comparison tool of Japanese tokenizers
Natural Language Processing (NLP). Covering topics such as Tokenization, Part Of Speech tagging (POS), Machine translation, Named Entity Recognition (NER), Classification, and Sentiment analysis.
RaKUn 2.0 - A fast keyword detection algorithm
Taxonomy4Good: a sustainability lexicon that provides the freedom to create custom taxonomies in addition to listed ESG and Sustainability Standards taxonomies.
Semi automated youtube channel that has a lot of cool features for someone to use in their content generating projectProject mention: YouTube content creation assistant | /r/Python | 2023-06-08
Lightweight utility tools for the detection of multiple spellings, meanings, and language-specific terminology in British and American English
Multilingual Entity Linking model by BELA modelProject mention: [P] MultiEL: Multilingual Entity Linking model by BELA model | /r/MachineLearning | 2023-06-29
NLP framework for phonologyProject mention: Seeking your insights on "Loquax": A tool for phonological analysis | /r/latin | 2023-05-30
Lovely - thanks so much for the feedback u/christmas_fan1 - it means a lot. I've created an issue with it linking back to your original comment: https://github.com/mattlianje/loquax/issues/11
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
Python nlp-library related posts
[P] MultiEL: Multilingual Entity Linking model by BELA model
1 project | /r/MachineLearning | 29 Jun 2023
YouTube content creation assistant
1 project | /r/Python | 8 Jun 2023
Seeking your insights on "Loquax": A tool for phonological analysis
3 projects | /r/latin | 30 May 2023
I used GPT-4 to create code that automates absolutely everything in creating YouTube Shorts, from voiceover to editing, even down to choosing the illustration images.
3 projects | /r/ChatGPT | 27 May 2023
[Arabic>latin transliteration] any apps for this?
1 project | /r/translator | 30 Apr 2023
[P] Programmatic: Powerful Weak Labeling
2 projects | /r/MachineLearning | 20 Apr 2022
Show HN: Programmatic – a REPL for creating labeled data
1 project | news.ycombinator.com | 8 Apr 2022
A note from our sponsor - #<SponsorshipServiceOld:0x00007f0f9b7c9410>
www.saashub.com | 5 Dec 2023
What are some of the best open-source nlp-library projects in Python? This list will help you: