Our great sponsors
-
corex_topic
Hierarchical unsupervised and semi-supervised topic models for sparse count data with CorEx
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
We ended up settling on a HuggingFace transformer + HDBSCAN pipeline from BERTopic. I like this because it makes it straightforward to tune and test, and you probabilistically assign documents to clusters, so you can do interesting aggregation and sampling after you have your inference done, like selecting text. Other options include top2vec which basically does the same thing without some guiding tools available in BERTopic. Either is suitable for what you’re doing. Older techniques include things like the Latent Dirichlet Allocation and COREX.
Related posts
- Are topic models reliable or useful?
- Trying to read text documents and allow for up to m labels per documents, like suggested tags, but the number of labels can be different for each document. Any advice?
- NLP Problem
- NLP: How to visualise the main context (in the form of words, sentences etc) of a text document?
- [P][D] Self Organizing Maps