Top 13 Python topic-modeling Projects

gensim

1 18 16,157 5.9 Python

Topic Modelling for Humans
InfluxDB

www.influxdata.com featured

InfluxDB – Built for High-Performance Time Series Workloads. InfluxDB 3 OSS is now GA. Transform, enrich, and act on time series data directly in the database. Automate critical tasks and eliminate the need to move data externally. Download now.
BERTopic

2 22 6,996 7.9 Python

Leveraging BERT and c-TF-IDF to create easily interpretable topics.
Top2Vec

3 13 3,076 6.8 Python

Top2Vec learns jointly embedded topic, document and word vectors.
scattertext

4 3 2,311 2.7 Python

Beautiful visualizations of how language differs among document types.
contextualized-topic-models

5 7 1,242 3.8 Python

A python package to run contextualized topic modeling. CTMs combine contextualized embeddings (e.g., BERT) with topic models to get coherent topics. Published at EACL and ACL 2021 (Bianchi et al.).
OCTIS

6 7 775 5.6 Python

OCTIS: Comparing Topic Models is Simple! A python package to optimize and evaluate topic models (accepted at EACL2021 demo track)
corex_topic

7 5 635 3.8 Python

Hierarchical unsupervised and semi-supervised topic models for sparse count data with CorEx
Sevalla

sevalla.com featured

Deploy and host your apps and databases, now with $50 credit! Sevalla is the PaaS you have been looking for! Advanced deployment pipelines, usage-based pricing, preview apps, templates, human support by developers, and much more!
GuidedLDA

8 1 510 0.0 Python

semi supervised guided topic model with custom guidedLDA (by vi3k6i5)
embedded-topic-model

9 1 95 7.3 Python

A package to run embedded topic modelling with ETM. Adapted from the original at: https://github.com/adjidieng/ETM
GitModel

10 6 61 6.8 Python

Codebase topic modeling using GNNs(Node aggregation and clustering)
Auto-Research

11 1 58 0.0 Python

Generate custom detailed survey paper with topic clustered sections and proper citations, from just a single query in just under 30 mins !!
cusim

12 1 45 0.0 Python

Superfast CUDA implementation of Word2Vec and Latent Dirichlet Allocation (LDA)
jouresearch-nlp

13 1 3 10.0 Python

A python package for generating topics, named entities and a wordcloud visualization. It leverages the SpaCy framework and sentence transformers.

NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020).

Python topic-modeling discussion

Python topic-modeling related posts

[D] Is it better to create a different set of Doc2Vec embeddings for each group in my dataset, rather than generating embeddings for the entire dataset?

1 project | /r/MachineLearning | 28 Oct 2023
Aggregating news from different sources

1 project | /r/learnprogramming | 8 Jul 2023
how can a top2vec output be improved

1 project | /r/learnmachinelearning | 4 Jul 2023
Tips for best Top2Vec (HDBSCAN) usage

1 project | /r/datascience | 8 Jun 2023
[Project]Topic modelling of tweets from the same user

2 projects | /r/MachineLearning | 14 Apr 2023
SBERT Embeddings from Conversations

2 projects | /r/LanguageTechnology | 3 Mar 2023
Sentence transformers (BERTopic) on a Macbook Air

1 project | /r/datascience | 13 Feb 2023
A note from our sponsor - InfluxDB
www.influxdata.com | 1 Sep 2025

InfluxDB 3 OSS is now GA. Transform, enrich, and act on time series data directly in the database. Automate critical tasks and eliminate the need to move data externally. Download now. Learn more →

Index

What are some of the best open-source topic-modeling projects in Python? This list will help you:

#	Project	Stars
1	gensim	16,157
2	BERTopic	6,996
3	Top2Vec	3,076
4	scattertext	2,311
5	contextualized-topic-models	1,242
6	OCTIS	775
7	corex_topic	635
8	GuidedLDA	510
9	embedded-topic-model	95
10	GitModel	61
11	Auto-Research	58
12	cusim	45
13	jouresearch-nlp	3

Python topic-modeling

Top 13 Python topic-modeling Projects

Python topic-modeling discussion

Python topic-modeling related posts

[D] Is it better to create a different set of Doc2Vec embeddings for each group in my dataset, rather than generating embeddings for the entire dataset?

Aggregating news from different sources

how can a top2vec output be improved

Tips for best Top2Vec (HDBSCAN) usage

[Project]Topic modelling of tweets from the same user

SBERT Embeddings from Conversations

Sentence transformers (BERTopic) on a Macbook Air

Index

Did you know that Python is the 2nd most popular programming language based on number of references?

Did you know that Python is
the 2nd most popular programming language
based on number of references?