Python topic-modeling

Open-source Python projects categorized as topic-modeling

Top 13 Python topic-modeling Projects

topic-modeling
  1. gensim

    Topic Modelling for Humans

  2. InfluxDB

    InfluxDB – Built for High-Performance Time Series Workloads. InfluxDB 3 OSS is now GA. Transform, enrich, and act on time series data directly in the database. Automate critical tasks and eliminate the need to move data externally. Download now.

    InfluxDB logo
  3. BERTopic

    Leveraging BERT and c-TF-IDF to create easily interpretable topics.

  4. Top2Vec

    Top2Vec learns jointly embedded topic, document and word vectors.

  5. scattertext

    Beautiful visualizations of how language differs among document types.

  6. contextualized-topic-models

    A python package to run contextualized topic modeling. CTMs combine contextualized embeddings (e.g., BERT) with topic models to get coherent topics. Published at EACL and ACL 2021 (Bianchi et al.).

  7. OCTIS

    OCTIS: Comparing Topic Models is Simple! A python package to optimize and evaluate topic models (accepted at EACL2021 demo track)

  8. corex_topic

    Hierarchical unsupervised and semi-supervised topic models for sparse count data with CorEx

  9. Sevalla

    Deploy and host your apps and databases, now with $50 credit! Sevalla is the PaaS you have been looking for! Advanced deployment pipelines, usage-based pricing, preview apps, templates, human support by developers, and much more!

    Sevalla logo
  10. GuidedLDA

    semi supervised guided topic model with custom guidedLDA (by vi3k6i5)

  11. embedded-topic-model

    A package to run embedded topic modelling with ETM. Adapted from the original at: https://github.com/adjidieng/ETM

  12. GitModel

    Codebase topic modeling using GNNs(Node aggregation and clustering)

  13. Auto-Research

    Generate custom detailed survey paper with topic clustered sections and proper citations, from just a single query in just under 30 mins !!

  14. cusim

    Superfast CUDA implementation of Word2Vec and Latent Dirichlet Allocation (LDA)

  15. jouresearch-nlp

    A python package for generating topics, named entities and a wordcloud visualization. It leverages the SpaCy framework and sentence transformers.

NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020).

Python topic-modeling discussion

Log in or Post with

Python topic-modeling related posts

  • [D] Is it better to create a different set of Doc2Vec embeddings for each group in my dataset, rather than generating embeddings for the entire dataset?

    1 project | /r/MachineLearning | 28 Oct 2023
  • Aggregating news from different sources

    1 project | /r/learnprogramming | 8 Jul 2023
  • how can a top2vec output be improved

    1 project | /r/learnmachinelearning | 4 Jul 2023
  • Tips for best Top2Vec (HDBSCAN) usage

    1 project | /r/datascience | 8 Jun 2023
  • [Project]Topic modelling of tweets from the same user

    2 projects | /r/MachineLearning | 14 Apr 2023
  • SBERT Embeddings from Conversations

    2 projects | /r/LanguageTechnology | 3 Mar 2023
  • Sentence transformers (BERTopic) on a Macbook Air

    1 project | /r/datascience | 13 Feb 2023
  • A note from our sponsor - InfluxDB
    www.influxdata.com | 1 Sep 2025
    InfluxDB 3 OSS is now GA. Transform, enrich, and act on time series data directly in the database. Automate critical tasks and eliminate the need to move data externally. Download now. Learn more →

Index

What are some of the best open-source topic-modeling projects in Python? This list will help you:

# Project Stars
1 gensim 16,157
2 BERTopic 6,996
3 Top2Vec 3,076
4 scattertext 2,311
5 contextualized-topic-models 1,242
6 OCTIS 775
7 corex_topic 635
8 GuidedLDA 510
9 embedded-topic-model 95
10 GitModel 61
11 Auto-Research 58
12 cusim 45
13 jouresearch-nlp 3

Sponsored
InfluxDB – Built for High-Performance Time Series Workloads
InfluxDB 3 OSS is now GA. Transform, enrich, and act on time series data directly in the database. Automate critical tasks and eliminate the need to move data externally. Download now.
www.influxdata.com

Did you know that Python is
the 2nd most popular programming language
based on number of references?