Alternate approaches to TF-IDF?

This page summarizes the projects mentioned and recommended in the original post on /r/LanguageTechnology

InfluxDB - Power Real-Time Data Analytics at Scale
Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
www.influxdata.com
featured
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com
featured
  • faiss

    A library for efficient similarity search and clustering of dense vectors.

  • FAISS works well https://github.com/facebookresearch/faiss

  • KeyBERT

    Minimal keyword extraction with BERT

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • yake

    Single-document unsupervised keyword extraction

  • You can look for usage here: https://github.com/LIAAD/yake and there is also a reference section with publications for more details of how this works. From what I remember, each keyphrase candidate is assigned an aggregated score based on various features: position in the text, casing, frequency, surrounding text frequency...

  • scattertext

    Beautiful visualizations of how language differs among document types.

  • Other suggestions: Take a look at Scattertext. Compare keywords to the problem of aspect extraction. I think an underutilized way to look at textual data when you have a single group of interest is the word-frequency-based odds ratio.

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts

  • [P] Building model to extract keywords from legal documents

    5 projects | /r/MachineLearning | 24 Aug 2021
  • Rust Keyword Extraction: Creating the YAKE! algorithm from scratch

    2 projects | dev.to | 27 Apr 2024
  • I want to extract important keywords from large documents...

    1 project | /r/LangChain | 7 Dec 2023
  • Show HN: Whisper.cpp and YAKE to Analyse Voice Reflections [iOS]

    1 project | news.ycombinator.com | 20 Feb 2023
  • [P] what is the most efficient way to pattern matching word-to-word?

    2 projects | /r/MachineLearning | 1 Jun 2022