Alternate approaches to TF-IDF?

This page summarizes the projects mentioned and recommended in the original post on /r/LanguageTechnology

SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com
featured
  • faiss

    A library for efficient similarity search and clustering of dense vectors.

    FAISS works well https://github.com/facebookresearch/faiss

  • SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

    SaaSHub logo
  • KeyBERT

    Minimal keyword extraction with BERT

  • yake

    Single-document unsupervised keyword extraction

    You can look for usage here: https://github.com/LIAAD/yake and there is also a reference section with publications for more details of how this works. From what I remember, each keyphrase candidate is assigned an aggregated score based on various features: position in the text, casing, frequency, surrounding text frequency...

  • scattertext

    Beautiful visualizations of how language differs among document types.

    Other suggestions: Take a look at Scattertext. Compare keywords to the problem of aspect extraction. I think an underutilized way to look at textual data when you have a single group of interest is the word-frequency-based odds ratio.

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts

  • [P] Building model to extract keywords from legal documents

    5 projects | /r/MachineLearning | 24 Aug 2021
  • Rust Keyword Extraction: Creating the YAKE! algorithm from scratch

    2 projects | dev.to | 27 Apr 2024
  • I want to extract important keywords from large documents...

    1 project | /r/LangChain | 7 Dec 2023
  • Show HN: Whisper.cpp and YAKE to Analyse Voice Reflections [iOS]

    1 project | news.ycombinator.com | 20 Feb 2023
  • [P] what is the most efficient way to pattern matching word-to-word?

    2 projects | /r/MachineLearning | 1 Jun 2022