Top 9 Python topic-modeling Projects
Topic Modelling for HumansProject mention: Topic modelling with Gensim and SpaCy on startup news | dev.to | 2022-01-17
For the topic modelling itself, I am going to use Gensim library by Radim Rehurek, which is very developer friendly and easy to use.
Leveraging BERT and c-TF-IDF to create easily interpretable topics.Project mention: Ultimate Guide To Text Similarity With Python | reddit.com/r/Python | 2022-01-12
Alternatively, you could try a package like BERTopic for this particular use case.
Static code analysis for 29 languages.. Your projects are multi-language. So is SonarQube analysis. Find Bugs, Vulnerabilities, Security Hotspots, and Code Smells so you can release quality code every time. Get started analyzing your projects today for free.
Beautiful visualizations of how language differs among document types.Project mention: Clustering of text - Where to start? | reddit.com/r/LanguageTechnology | 2021-08-04
If what you want is to determine how similar two categories are, or to learn something about the structure or words that compose those categories, you might consider word shift graphs or Scattertext.
Top2Vec learns jointly embedded topic, document and word vectors.Project mention: Extracting topics from 250k facebook posts | reddit.com/r/LanguageTechnology | 2021-05-26
Since you already have the facebook posts, you can use top2vec https://github.com/ddangelov/Top2Vec
A python package to run contextualized topic modeling. CTMs combine contextualized embeddings (e.g., BERT) with topic models to get coherent topics. Published at EACL and ACL 2021.Project mention: Catogorize the Data- Topic Modelling algorithm | reddit.com/r/LanguageTechnology | 2021-10-01
a bit of shameless self-promotion, but we developed a topic model (https://github.com/MilaNLProc/contextualized-topic-models) that actually supports that use case!
Hierarchical unsupervised and semi-supervised topic models for sparse count data with CorExProject mention: Are topic models reliable or useful? | news.ycombinator.com | 2021-09-27
We started off by trying LDA and NMF, but the topics were too messy so we wound up switching to CorEx (https://github.com/gregversteeg/corex_topic), which is a semi-supervised algo that lets you "nudge" the model in the right direction using anchor terms. By the time our topics started looking coherent, it turned out that a regex with the anchor terms we'd picked outperformed the model itself. This case study was on a relatively small sample of relatively short documents (~4k survey open-ends) but for what it's worth, we also tried to use topic models to classify congressional Facebook posts (much larger corpus and longer documents) and the results were the same.
Overfitting is certainly part of the problem - in one of my earlier posts I talk about "conceptually spurious words," which are essentially the product of overfitting - but the more difficult problem is polysemy. I'm sure there are ways to mitigate that - expanding the feature space with POS tagging, etc. - but ultimately I think the solution is to simply avoid using a dimensionality reduction method for text classification. Supervised models are clearly the way to go - even if those "models" are just keyword dictionaries curated based on domain knowledge.
semi supervised guided topic model with custom guidedLDA (by vi3k6i5)Project mention: SOTA for Topic Modeling | reddit.com/r/LanguageTechnology | 2021-03-25
Less time debugging, more time building. Scout APM allows you to find and fix performance issues with no hassle. Now with error monitoring and external services monitoring, Scout is a developer's best friend when it comes to application development.
OCTIS: Comparing Topic Models is Simple! A python package to optimize and evaluate topic models (accepted at EACL2021 demo track)Project mention: (NLP) Best practices for topic modeling and generating interesting topics? | reddit.com/r/MLQuestions | 2021-05-31
My team and I have recently released a python library called OCTIS (https://github.com/mind-Lab/octis) that allows you to automatically optimize the hyperparameters of a topic model according to a given evaluation metric (not log-likelihood). I guess, in your case, you might be interested in topic coherence. So you will get good quality topics with a low effort on the choice of the hyperparameters. Also, we included some state-of-the-art topic models, e.g. contextualized topic models (https://github.com/MilaNLProc/contextualized-topic-models).
Superfast CUDA implementation of Word2Vec and Latent Dirichlet Allocation (LDA)Project mention: [P] CUSIM - Superfast CUDA implementation of Word2Vec and Latent Dirichlet Allocation (LDA) | reddit.com/r/MachineLearning | 2021-02-20
Python topic-modeling related posts
Ultimate Guide To Text Similarity With Python
1 project | reddit.com/r/Python | 12 Jan 2022
Text Classification using Unsupervised Learning.
1 project | reddit.com/r/LanguageTechnology | 6 Jan 2022
Unsupervised Learning for String Matching in Python - can I have advice on how to go about this?
2 projects | reddit.com/r/learnmachinelearning | 16 Dec 2021
Gensim: Topic Modelling for Humans
1 project | news.ycombinator.com | 7 Dec 2021
Gensim – a Python library for topic modelling, document indexing
1 project | news.ycombinator.com | 25 Nov 2021
How to build a search engine with word embeddings
2 projects | dev.to | 22 Nov 2021
Which method/model to opt for while identifying semantic similarity?
1 project | reddit.com/r/LanguageTechnology | 20 Nov 2021
What are some of the best open-source topic-modeling projects in Python? This list will help you:
Are you hiring? Post a new remote job listing for free.