SaaSHub helps you find the best software and product alternatives Learn more →
Top 23 Python Clustering Projects
-
I know I've tooted its horn before, but Orange3 is a pretty neat Python-based GUI platform that makes this and a metric buttload of other statistical/ML techniques available to non-programmer types.
Just watch out for null character `x00` in the corpus. That always seems to kill it stone dead.
https://orangedatamining.com/
https://orange3.readthedocs.io/projects/orange-visual-progra...
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
-
dedupe
:id: A python library for accurate and scalable fuzzy matching, record deduplication and entity-resolution.
For sure. The problem was writing an np.unique[1] that could handle large datasets. Specifically, the solution involved chunking the dataset, mapping np.unique across chunks, and then combining chunks. Merging the counts result is an associative operation and merging them in a tree-like computational graph implies O(log n) merges.
Specifically this is work related to implementing large dataset support for the dedupe library[1]. It's valuable to be able to effectively de-duplicate messy datasets. That's about as much as I can share.
1. https://numpy.org/doc/stable/reference/generated/numpy.uniqu...
2. https://github.com/dedupeio/dedupe/blob/main/dedupe/clusteri...
-
awesome-community-detection
A curated list of community detection research papers with implementations.
-
uis-rnn
This is the library for the Unbounded Interleaved-State Recurrent Neural Network (UIS-RNN) algorithm, corresponding to the paper Fully Supervised Speaker Diarization.
-
-
Unsupervised-Classification
SCAN: Learning to Classify Images without Labels, incl. SimCLR. [ECCV 2020]
-
PyPOTS
A Python toolkit/library for reality-centric machine/deep learning and data mining on partially-observed time series, including SOTA neural network models for scientific analysis tasks of imputation/classification/clustering/forecasting/anomaly detection/cleaning on incomplete industrial (irregularly-sampled) multivariate TS with NaN missing values
-
uform
Pocket-Sized Multimodal AI for content understanding and generation across multilingual texts, images, and 🔜 video, up to 5x faster than OpenAI CLIP and LLaVA 🖼️ & 🖋️
Project mention: Recapping the AI, Machine Learning and Data Science Meetup - May 30, 2024 | dev.to | 2024-06-04UForm: Pocket-Sized Multimodal AI for Content Understanding and Generation
-
similarity
TensorFlow Similarity is a python package focused on making similarity learning quick and easy.
-
-
Unsupervised-Semantic-Segmentation
Unsupervised Semantic Segmentation by Contrasting Object Mask Proposals. [ICCV 2021]
-
matrixprofile
A Python 3 library making time series data mining tasks, utilizing matrix profile algorithms, accessible to everyone.
-
-
-
-
stringlifier
Stringlifier is on Opensource ML Library for detecting random strings in raw text. It can be used in sanitising logs, detecting accidentally exposed credentials and as a pre-processing step in unsupervised ML-based analysis of application text data.
-
-
n2d
A deep clustering algorithm. Code to reproduce results for our paper N2D: (Not Too) Deep Clustering via Clustering the Local Manifold of an Autoencoded Embedding.
-
-
-
apple-ocr
Easy-to-Use Apple Vision wrapper for text extraction, scalar representation and clustering using K-means.
Project mention: Easy-to-Use Apple Vision wrapper for text extraction and clustering | news.ycombinator.com | 2024-01-28The most interesting piece of code here IMO is the recognize() method, which demonstrates how to call Vision.VNImageRequestHandler and Vision.VNRecognizeTextRequest from Python code using pyobjc.
https://github.com/louisbrulenaudet/apple-ocr/blob/70c25b24b...
-
-
Revisiting-Contrastive-SSL
Revisiting Contrastive Methods for Unsupervised Learning of Visual Representations. [NeurIPS 2021]
Python Clustering discussion
Python Clustering related posts
-
Open source laser microphone picks up laptop keystrokes
-
Llama V2 is free to try on the AI Horde
-
How to run pygmalion: usefull links
-
Crooks’ Mistaken Bet on Encrypted Phones
-
ELI5 hardware keylogging
-
KoboldAI Lite now has Stable Horde integration for automatic inline image generation in stories.
-
Middle ground dataset between CIFAR and ImageNet [D]
-
A note from our sponsor - SaaSHub
www.saashub.com | 7 Dec 2024
Index
What are some of the best open-source Clustering projects in Python? This list will help you:
Project | Stars | |
---|---|---|
1 | orange | 4,892 |
2 | dedupe | 4,160 |
3 | awesome-community-detection | 2,342 |
4 | uis-rnn | 1,563 |
5 | minisom | 1,462 |
6 | Unsupervised-Classification | 1,366 |
7 | PyPOTS | 1,135 |
8 | uform | 1,059 |
9 | similarity | 1,013 |
10 | slot-attention | 397 |
11 | Unsupervised-Semantic-Segmentation | 397 |
12 | matrixprofile | 362 |
13 | TEXTOIR | 202 |
14 | fuzzy-c-means | 176 |
15 | relevanceai | 166 |
16 | stringlifier | 164 |
17 | DBCV | 156 |
18 | n2d | 127 |
19 | text-summarizer | 113 |
20 | hazelcast-python-client | 111 |
21 | apple-ocr | 89 |
22 | impfuzzy | 87 |
23 | Revisiting-Contrastive-SSL | 86 |