Python nlp-library

Open-source Python projects categorized as nlp-library

Top 22 Python nlp-library Projects

  • transformers

    🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.

    Project mention: Fine-Tuned Llama2 Inserting Unnecessary Delimiters | /r/LocalLLaMA | 2023-11-04

    While its tough to say something specifc since we dont know how exactly you trained it or the prompt format of your training input or how you are performing inference, one thing I found when I faced similar types of issues is that the model does not know when to stop. Some of it is because the fast llama tokenizer does not add the token when encoding your inputs. So you can either add that token explicitly in your input text for each sample or use the slow llama tokenizer. Check llama_recipes github repo for the exact issue The other most probable thing you might want to check is if the model.generate output contains the exact input tokens too. That is the expected behavior of some models (like llama2 or mpt) for example when you use vanilla transformers for inference.

  • spaCy

    💫 Industrial-strength Natural Language Processing (NLP) in Python

    Project mention: A beginner’s guide to sentiment analysis using OceanBase and spaCy | | 2023-10-25

    In this article, I'm going to walk through a sentiment analysis project from start to finish, using open-source Amazon product reviews. However, using the same approach, you can easily implement mass sentiment analysis on your own products. We'll explore an approach to sentiment analysis with one of the most popular Python NLP packages: spaCy.

  • Onboard AI

    Learn any GitHub repo in 59 seconds. Onboard AI learns any GitHub repo in minutes and lets you chat with it to locate functionality, understand different parts, and generate new code. Use it for free at

  • OpenPrompt

    An Open-Source Framework for Prompt-Learning.

  • FARM

    :house_with_garden: Fast & easy transfer learning for NLP. Harvesting language models for the industry. Focus on Question Answering.

  • tika-python

    Tika-Python is a Python binding to the Apache Tika™ REST services allowing Tika to be called natively in the Python community.

  • contextualized-topic-models

    A python package to run contextualized topic modeling. CTMs combine contextualized embeddings (e.g., BERT) with topic models to get coherent topics. Published at EACL and ACL 2021.

    Project mention: [Project]Topic modelling of tweets from the same user | /r/MachineLearning | 2023-04-14

    In our experiments, CTM works well with tweets: (I'm one of the authors)

  • skweak

    skweak: A software toolkit for weak supervision applied to NLP tasks

    Project mention: Entity Extraction with Predefined List | /r/LanguageTechnology | 2023-01-07

    Thanks for pointing me in the right direction. Seems like there’s a few other approaches with weak supervision:

  • InfluxDB

    Collect and Analyze Billions of Data Points in Real Time. Manage all types of time series data in a single, purpose-built database. Run at any scale in any environment in the cloud, on-premises, or at the edge.

  • pythainlp

    Thai Natural Language Processing in Python.

  • janome

    Japanese morphological analysis engine written in pure Python

    Project mention: [discussion] Open AI api translations | /r/Re_Zero | 2023-04-19

    OCTIS: Comparing Topic Models is Simple! A python package to optimize and evaluate topic models (accepted at EACL2021 demo track)

  • camel_tools

    A suite of Arabic natural language processing tools developed by the CAMeL Lab at New York University Abu Dhabi.

    Project mention: [Arabic>latin transliteration] any apps for this? | /r/translator | 2023-04-30

    Otherwise it depends on your use case. There are NLP libraries like this one that can do the job.

  • zshot

    Zero and Few shot named entity & relationships recognition

    Project mention: A transformer-based method for zero and few-shot biomedical NER | | 2023-05-12
  • mutate

    A library to synthesize text datasets using Large Language Models (LLM)

  • turkish-deasciifier

    Turkish deasciifier in Python based on Deniz Yüret's turkish-mode for Emacs

  • toiro

    A comparison tool of Japanese tokenizers

  • NLP-Guide

    Natural Language Processing (NLP). Covering topics such as Tokenization, Part Of Speech tagging (POS), Machine translation, Named Entity Recognition (NER), Classification, and Sentiment analysis.

  • rakun2

    RaKUn 2.0 - A fast keyword detection algorithm

  • taxonomy4good

    Taxonomy4Good: a sustainability lexicon that provides the freedom to create custom taxonomies in addition to listed ESG and Sustainability Standards taxonomies.

  • Semi-Automated-Youtube-Channel

    Semi automated youtube channel that has a lot of cool features for someone to use in their content generating project

    Project mention: YouTube content creation assistant | /r/Python | 2023-06-08
  • breame

    Lightweight utility tools for the detection of multiple spellings, meanings, and language-specific terminology in British and American English

  • MultiEL

    Multilingual Entity Linking model by BELA model

    Project mention: [P] MultiEL: Multilingual Entity Linking model by BELA model | /r/MachineLearning | 2023-06-29
  • loquax

    NLP framework for phonology

    Project mention: Seeking your insights on "Loquax": A tool for phonological analysis | /r/latin | 2023-05-30

    Lovely - thanks so much for the feedback u/christmas_fan1 - it means a lot. I've created an issue with it linking back to your original comment:

  • SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020). The latest post mention was on 2023-11-04.

Python nlp-library related posts


What are some of the best open-source nlp-library projects in Python? This list will help you:

Project Stars
1 transformers 116,187
2 spaCy 27,703
3 OpenPrompt 3,931
4 FARM 1,699
5 tika-python 1,363
6 contextualized-topic-models 1,124
7 skweak 899
8 pythainlp 892
9 janome 801
10 OCTIS 627
11 camel_tools 355
12 zshot 284
13 mutate 146
14 turkish-deasciifier 138
15 toiro 110
16 NLP-Guide 60
17 rakun2 56
18 taxonomy4good 20
19 Semi-Automated-Youtube-Channel 13
20 breame 9
21 MultiEL 7
22 loquax 2
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives