Alternate approaches to TF-IDF?

InfluxDB - Power Real-Time Data Analytics at Scale

Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

www.influxdata.com

featured

SaaSHub - Software Alternatives and Reviews

SaaSHub helps you find the best software and product alternatives

www.saashub.com

featured

faiss

71 28,308 9.4 C++

A library for efficient similarity search and clustering of dense vectors.

FAISS works well https://github.com/facebookresearch/faiss

KeyBERT

5 3,229 6.1 Python

Minimal keyword extraction with BERT
InfluxDB

www.influxdata.com featured

Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
yake

5 1,574 3.0 Python

Single-document unsupervised keyword extraction

You can look for usage here: https://github.com/LIAAD/yake and there is also a reference section with publications for more details of how this works. From what I remember, each keyphrase candidate is assigned an aggregated score based on various features: position in the text, casing, frequency, surrounding text frequency...

scattertext

3 2,203 4.7 Python

Beautiful visualizations of how language differs among document types.

Other suggestions: Take a look at Scattertext. Compare keywords to the problem of aspect extraction. I think an underutilized way to look at textual data when you have a single group of interest is the word-frequency-based odds ratio.

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

[P] Building model to extract keywords from legal documents

5 projects | /r/MachineLearning | 24 Aug 2021
Rust Keyword Extraction: Creating the YAKE! algorithm from scratch

2 projects | dev.to | 27 Apr 2024
I want to extract important keywords from large documents...

1 project | /r/LangChain | 7 Dec 2023
Show HN: Whisper.cpp and YAKE to Analyse Voice Reflections [iOS]

1 project | news.ycombinator.com | 20 Feb 2023
[P] what is the most efficient way to pattern matching word-to-word?

2 projects | /r/MachineLearning | 1 Jun 2022

Alternate approaches to TF-IDF?

This page summarizes the projects mentioned and recommended in the original post on /r/LanguageTechnology
keyword-extraction NLP keyphrase-extraction unsupervised-approach D3
Post date: 14 Mar 2021

faiss

KeyBERT

InfluxDB

yake

scattertext

Related posts

[P] Building model to extract keywords from legal documents

Rust Keyword Extraction: Creating the YAKE! algorithm from scratch

I want to extract important keywords from large documents...

Show HN: Whisper.cpp and YAKE to Analyse Voice Reflections [iOS]

[P] what is the most efficient way to pattern matching word-to-word?

Alternate approaches to TF-IDF?

This page summarizes the projects mentioned and recommended in the original post on /r/LanguageTechnology keyword-extraction NLP keyphrase-extraction unsupervised-approach D3 Post date: 14 Mar 2021

faiss

KeyBERT

InfluxDB

yake

scattertext

Related posts

[P] Building model to extract keywords from legal documents

Rust Keyword Extraction: Creating the YAKE! algorithm from scratch

I want to extract important keywords from large documents...

Show HN: Whisper.cpp and YAKE to Analyse Voice Reflections [iOS]

[P] what is the most efficient way to pattern matching word-to-word?

This page summarizes the projects mentioned and recommended in the original post on /r/LanguageTechnology
keyword-extraction NLP keyphrase-extraction unsupervised-approach D3
Post date: 14 Mar 2021