BERT-Based Clustering on a Corpus of Genre Samples Kinda Sucks. Why?

Our great sponsors

WorkOS - The modern identity platform for B2B SaaS

InfluxDB - Power Real-Time Data Analytics at Scale

SaaSHub - Software Alternatives and Reviews

Our great sponsors

SimCSE

2 3,242 0.0 Python

[EMNLP 2021] SimCSE: Simple Contrastive Learning of Sentence Embeddings https://arxiv.org/abs/2104.08821

Base BERT sentence embeddings are just not good for a couple of reasons and there's some research papers that show this. You can try SimCSE, Google's USE or SBERT as mentioned previously and you'll get better output. It's just an inherent flaw to base BERT that it can't produce good sentence embeddings. Papers have shown you probably will get better scores using GloVe embeddings from scratch than base BERT.

WorkOS

workos.com sponsored

The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project