Python topic-modeling

Open-source Python projects categorized as topic-modeling | Edit details

Top 9 Python topic-modeling Projects

  • GitHub repo gensim

    Topic Modelling for Humans

    Project mention: Topic modelling with Gensim and SpaCy on startup news | dev.to | 2022-01-17

    For the topic modelling itself, I am going to use Gensim library by Radim Rehurek, which is very developer friendly and easy to use.

  • GitHub repo BERTopic

    Leveraging BERT and c-TF-IDF to create easily interpretable topics.

    Project mention: Ultimate Guide To Text Similarity With Python | reddit.com/r/Python | 2022-01-12

    Alternatively, you could try a package like BERTopic for this particular use case.

  • SonarQube

    Static code analysis for 29 languages.. Your projects are multi-language. So is SonarQube analysis. Find Bugs, Vulnerabilities, Security Hotspots, and Code Smells so you can release quality code every time. Get started analyzing your projects today for free.

  • GitHub repo scattertext

    Beautiful visualizations of how language differs among document types.

    Project mention: Clustering of text - Where to start? | reddit.com/r/LanguageTechnology | 2021-08-04

    If what you want is to determine how similar two categories are, or to learn something about the structure or words that compose those categories, you might consider word shift graphs or Scattertext.

  • GitHub repo Top2Vec

    Top2Vec learns jointly embedded topic, document and word vectors.

    Project mention: Extracting topics from 250k facebook posts | reddit.com/r/LanguageTechnology | 2021-05-26

    Since you already have the facebook posts, you can use top2vec https://github.com/ddangelov/Top2Vec

  • GitHub repo contextualized-topic-models

    A python package to run contextualized topic modeling. CTMs combine contextualized embeddings (e.g., BERT) with topic models to get coherent topics. Published at EACL and ACL 2021.

    Project mention: Catogorize the Data- Topic Modelling algorithm | reddit.com/r/LanguageTechnology | 2021-10-01

    a bit of shameless self-promotion, but we developed a topic model (https://github.com/MilaNLProc/contextualized-topic-models) that actually supports that use case!

  • GitHub repo corex_topic

    Hierarchical unsupervised and semi-supervised topic models for sparse count data with CorEx

    Project mention: Are topic models reliable or useful? | news.ycombinator.com | 2021-09-27

    We started off by trying LDA and NMF, but the topics were too messy so we wound up switching to CorEx (https://github.com/gregversteeg/corex_topic), which is a semi-supervised algo that lets you "nudge" the model in the right direction using anchor terms. By the time our topics started looking coherent, it turned out that a regex with the anchor terms we'd picked outperformed the model itself. This case study was on a relatively small sample of relatively short documents (~4k survey open-ends) but for what it's worth, we also tried to use topic models to classify congressional Facebook posts (much larger corpus and longer documents) and the results were the same.

    Overfitting is certainly part of the problem - in one of my earlier posts I talk about "conceptually spurious words," which are essentially the product of overfitting - but the more difficult problem is polysemy. I'm sure there are ways to mitigate that - expanding the feature space with POS tagging, etc. - but ultimately I think the solution is to simply avoid using a dimensionality reduction method for text classification. Supervised models are clearly the way to go - even if those "models" are just keyword dictionaries curated based on domain knowledge.

  • GitHub repo GuidedLDA

    semi supervised guided topic model with custom guidedLDA (by vi3k6i5)

    Project mention: SOTA for Topic Modeling | reddit.com/r/LanguageTechnology | 2021-03-25
  • Scout APM

    Less time debugging, more time building. Scout APM allows you to find and fix performance issues with no hassle. Now with error monitoring and external services monitoring, Scout is a developer's best friend when it comes to application development.

  • GitHub repo OCTIS

    OCTIS: Comparing Topic Models is Simple! A python package to optimize and evaluate topic models (accepted at EACL2021 demo track)

    Project mention: (NLP) Best practices for topic modeling and generating interesting topics? | reddit.com/r/MLQuestions | 2021-05-31

    My team and I have recently released a python library called OCTIS (https://github.com/mind-Lab/octis) that allows you to automatically optimize the hyperparameters of a topic model according to a given evaluation metric (not log-likelihood). I guess, in your case, you might be interested in topic coherence. So you will get good quality topics with a low effort on the choice of the hyperparameters. Also, we included some state-of-the-art topic models, e.g. contextualized topic models (https://github.com/MilaNLProc/contextualized-topic-models).

  • GitHub repo cusim

    Superfast CUDA implementation of Word2Vec and Latent Dirichlet Allocation (LDA)

    Project mention: [P] CUSIM - Superfast CUDA implementation of Word2Vec and Latent Dirichlet Allocation (LDA) | reddit.com/r/MachineLearning | 2021-02-20
NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020). The latest post mention was on 2022-01-17.

Python topic-modeling related posts

Index

What are some of the best open-source topic-modeling projects in Python? This list will help you:

Project Stars
1 gensim 12,834
2 BERTopic 1,817
3 scattertext 1,740
4 Top2Vec 1,503
5 contextualized-topic-models 752
6 corex_topic 517
7 GuidedLDA 445
8 OCTIS 281
9 cusim 22
Find remote jobs at our new job board 99remotejobs.com. There are 28 new remote jobs listed recently.
Are you hiring? Post a new remote job listing for free.
OPS - Build and Run Open Source Unikernels
Quickly and easily build and deploy open source unikernels in tens of seconds. Deploy in any language to any cloud.
github.com/nanovms