SaaSHub helps you find the best software and product alternatives Learn more →
Top 23 Python Clustering Projects
-
Project mention: Why don't more people use Altair for python Visualizations instead of Plotly? | /r/datascience | 2023-05-23
You should also check out Orange Data Mining, it allows to create a lot of charts, filter data from a chart to another, build ML models, predictions and a lot more. And you can do it with zero code.
-
dedupe
:id: A python library for accurate and scalable fuzzy matching, record deduplication and entity-resolution.
I think this problem is known as data deduplication, in particular, entity deduplication. I googled a bit and it seems approaches vary from manual deduplication to some sort of active learning (if I am not mistaken). I am also curios if pre-trained transformer-based cross encoders can provide any good results (they are trained on sentences I think, but may be worth a try). Another problem here is how to measure progress (compare different approaches)?
-
Sonar
Write Clean Python Code. Always.. Sonar helps you commit clean code every time. With over 225 unique rules to find Python bugs, code smells & vulnerabilities, Sonar finds the issues while you focus on the work.
-
awesome-community-detection
A curated list of community detection research papers with implementations.
-
uis-rnn
This is the library for the Unbounded Interleaved-State Recurrent Neural Network (UIS-RNN) algorithm, corresponding to the paper Fully Supervised Speaker Diarization.
Project mention: [D] Is there a way to distinguish different human voices from 1 audio file ? | /r/MachineLearning | 2022-10-03Looks like you can get an put of the box here: https://github.com/google/uis-rnn
-
-
Unsupervised-Classification
SCAN: Learning to Classify Images without Labels, incl. SimCLR. [ECCV 2020]
Project mention: Middle ground dataset between CIFAR and ImageNet [D] | /r/MachineLearning | 2022-09-17The subsets we used are from here: https://github.com/wvangansbeke/Unsupervised-Classification/tree/master/data/imagenet_subsets
-
similarity
TensorFlow Similarity is a python package focused on making similarity learning quick and easy.
Project mention: New free tool that uses fine-tuned BERT model to surface answers from research papers | /r/LanguageTechnology | 2022-10-28Tensorflow Ranking and Tensorflow similarity (maybe relevant/irrelevant contrastive learning?) look like they could be useful.
-
InfluxDB
Access the most powerful time series database as a service. Ingest, store, & analyze all types of time series data in a fully-managed, purpose-built database. Keep data forever with low-cost storage and superior data compression.
-
Unsupervised-Semantic-Segmentation
Unsupervised Semantic Segmentation by Contrasting Object Mask Proposals. [ICCV 2021]
-
-
-
matrixprofile
A Python 3 library making time series data mining tasks, utilizing matrix profile algorithms, accessible to everyone.
-
stringlifier
Stringlifier is on Opensource ML Library for detecting random strings in raw text. It can be used in sanitising logs, detecting accidentally exposed credentials and as a pre-processing step in unsupervised ML-based analysis of application text data.
-
-
- Intent extraction models such as https://github.com/thuiar/textoir. My problem with this approach is that they are multi-label classifiers and usually focused on single-sentence classification "Can you get me a table?" would be assigned to the "Reservation" label. I feel that I would lose information such as "Meet at 10PM in this address."
-
n2d
A deep clustering algorithm. Code to reproduce results for the paper N2D: (Not Too) Deep Clustering via Clustering the Local Manifold of an Autoencoded Embedding.
-
-
-
-
Revisiting-Contrastive-SSL
Revisiting Contrastive Methods for Unsupervised Learning of Visual Representations. [NeurIPS 2021]
-
-
-
-
AnnA_Anki_neuronal_Appendix
Using machine learning on your anki collection to enhance the scheduling via semantic clustering and semantic similarity
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
Python Clustering related posts
- How to run pygmalion: usefull links
- Crooks’ Mistaken Bet on Encrypted Phones
- ELI5 hardware keylogging
- KoboldAI Lite now has Stable Horde integration for automatic inline image generation in stories.
- Middle ground dataset between CIFAR and ImageNet [D]
- KoboldAI 1.17 - The Great Migration
- Clustering and visualizing the MAL recommendation graph
-
A note from our sponsor - #<SponsorshipServiceOld:0x00007f0920a84078>
www.saashub.com | 7 Jun 2023
Index
What are some of the best open-source Clustering projects in Python? This list will help you:
Project | Stars | |
---|---|---|
1 | orange | 4,125 |
2 | dedupe | 3,737 |
3 | awesome-community-detection | 2,143 |
4 | uis-rnn | 1,462 |
5 | minisom | 1,246 |
6 | Unsupervised-Classification | 1,194 |
7 | similarity | 948 |
8 | Unsupervised-Semantic-Segmentation | 357 |
9 | mteb | 330 |
10 | slot-attention | 313 |
11 | matrixprofile | 312 |
12 | stringlifier | 145 |
13 | fuzzy-c-means | 143 |
14 | TEXTOIR | 132 |
15 | n2d | 116 |
16 | text-summarizer | 112 |
17 | hazelcast-python-client | 106 |
18 | DBCV | 100 |
19 | Revisiting-Contrastive-SSL | 82 |
20 | impfuzzy | 79 |
21 | relevanceai | 58 |
22 | chinese-whispers-python | 54 |
23 | AnnA_Anki_neuronal_Appendix | 48 |