lemmatization

Top 9 lemmatization Open-Source Projects

  • trankit

    Trankit is a Light-Weight Transformer-based Python Toolkit for Multilingual Natural Language Processing

  • CogCompNLP

    CogComp's Natural Language Processing Libraries and Demos: Modules include lemmatizer, ner, pos, prep-srl, quantifier, question type, relation-extraction, similarity, temporal normalizer, tokenizer, transliteration, verb-sense, and more.

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • lemmatization-lists

    Machine-readable lists of lemma-token pairs in 23 languages.

  • LemmInflect

    A python module for English lemmatization and inflection.

  • huspacy

    HuSpaCy: industrial-strength Hungarian natural language processing

  • simplemma

    Simple multilingual lemmatizer for Python, especially useful for speed and efficiency

  • orange3-text

    🍊 :page_facing_up: Text Mining add-on for Orange3

  • Project mention: Ask HN: What Underrated Open Source Project Deserves More Recognition? | news.ycombinator.com | 2024-03-07
  • SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

    SaaSHub logo
  • syntaxdot

    Neural syntax annotator, supporting sequence labeling, lemmatization, and dependency parsing.

  • Project mention: Candle: Torch Replacement in Rust | news.ycombinator.com | 2023-08-08

    I am so happy about them releasing this. A few years ago I wrote a multi-task syntax annotator in Rust using Laurent Mazare's excellent tch-rs binding (it seems like he is also working on Candle):

    https://github.com/tensordot/syntaxdot

    However, the deployment story was always quite difficult. The PyTorch C++ API is not stable, so a particular version of tch-rs will only work with a particular PyTorch version. So, anyone wanting to use SyntaxDot always had to get exactly the right version of libtorch (and set some environment variables) to build the project.

    The idea of making an abstraction over Torch and Rust ndarray (similar to Burn) crossed my mind several times, but there is only so much that I could do as a solo developer. So Candle would be a god-given if I was still working on this project.

    Seeing Candle wants to make me port curated-transformers to Candle for fun:

    https://github.com/explosion/curated-transformers

  • collatinus

    Sources of Collatinus software - Latin lemmatizer, morphological analyzer and scansion

NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020).

lemmatization related posts

Index

What are some of the best open-source lemmatization projects? This list will help you:

Project Stars
1 trankit 707
2 CogCompNLP 469
3 lemmatization-lists 303
4 LemmInflect 248
5 huspacy 148
6 simplemma 125
7 orange3-text 124
8 syntaxdot 67
9 collatinus 60

Sponsored
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com