Top 23 Python Clustering Projects

orange

27 4,619 9.6 Python

🍊 :bar_chart: :bulb: Orange: Interactive data analysis

Project mention: Hierarchical Clustering | news.ycombinator.com | 2024-04-20

I know I've tooted its horn before, but Orange3 is a pretty neat Python-based GUI platform that makes this and a metric buttload of other statistical/ML techniques available to non-programmer types.
Just watch out for null character `x00` in the corpus. That always seems to kill it stone dead.
https://orangedatamining.com/
https://orange3.readthedocs.io/projects/orange-visual-progra...

dedupe

9 3,984 7.1 Python

:id: A python library for accurate and scalable fuzzy matching, record deduplication and entity-resolution.

Project mention: Using deep learning for Fuzzy Matching | /r/datascience | 2023-07-06

InfluxDB

www.influxdata.com featured

Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
awesome-community-detection

1 2,270 2.8 Python

A curated list of community detection research papers with implementations.
uis-rnn

3 1,533 3.5 Python

This is the library for the Unbounded Interleaved-State Recurrent Neural Network (UIS-RNN) algorithm, corresponding to the paper Fully Supervised Speaker Diarization.
mteb

2 1,421 9.8 Python

MTEB: Massive Text Embedding Benchmark

Project mention: AI for AWS Documentation | news.ycombinator.com | 2023-07-06

RAG is very difficult to do right. I am experimenting with various RAG projects from [1]. The main problems are:
- Chunking can interfer with context boundaries
- Content vectors can differ vastly from question vectors, for this you have to use hypothetical embeddings (they generate artificial questions and store them)
- Instead of saving just one embedding per text-chuck you should store various (text chunk, hypothetical embedding questions, meta data)
- RAG will miserably fail with requests like "summarize the whole document"
- to my knowledge, openAI embeddings aren't performing well, use a embedding that is optimized for question answering or information retrieval and supports multi language. Also look into instructor embeddings: https://github.com/embeddings-benchmark/mteb
1 https://github.com/underlines/awesome-marketing-datascience/...

minisom

3 1,388 8.4 Python

:red_circle: MiniSom is a minimalistic implementation of the Self Organizing Maps
Unsupervised-Classification

2 1,309 1.4 Python

SCAN: Learning to Classify Images without Labels, incl. SimCLR. [ECCV 2020]
SaaSHub

www.saashub.com featured

SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
similarity

7 997 6.5 Python

TensorFlow Similarity is a python package focused on making similarity learning quick and easy.
PyPOTS

50 683 9.6 Python

A Python toolbox/library for reality-centric machine/deep learning and data mining on partially-observed time series with PyTorch, including SOTA neural network models for science analysis tasks of imputation, classification, clustering, forecasting & anomaly detection on incomplete (irregularly-sampled) multivariate TS with NaN missing values

Project mention: Missing values in time series collected from the real world are common to see and very pesky. A new state-of-the-art and fast neural network called SAITS is proposed to impute missing data in partially-observed multivariate time series. The code is open source on GitHub. | /r/datascience | 2023-06-28

Oh, wow, thanks for sharing it here! PyPOTS still has a long way to go, and I'm making it better. If you have any suggestions for PyPOTS, please let me know. Your feedback is always welcome and means a lot to the community of PyPOTS! If you like PyPOTS, please star 🌟 PyPOTS repo on GitHub and share it with people you know who may need it to help others notice this helpful work. Thank you very much!

Unsupervised-Semantic-Segmentation

1 386 1.8 Python

Unsupervised Semantic Segmentation by Contrasting Object Mask Proposals. [ICCV 2021]
matrixprofile

7 356 0.0 Python

A Python 3 library making time series data mining tasks, utilizing matrix profile algorithms, accessible to everyone.
slot-attention

1 350 10.0 Python

Implementation of Slot Attention from GoogleAI
TEXTOIR

1 180 7.7 Python

TEXTOIR is the first opensource toolkit for text open intent recognition. (ACL 2021)
fuzzy-c-means

1 162 5.7 Python

A simple python implementation of Fuzzy C-means algorithm.
stringlifier

1 157 0.0 Python

Stringlifier is on Opensource ML Library for detecting random strings in raw text. It can be used in sanitising logs, detecting accidentally exposed credentials and as a pre-processing step in unsupervised ML-based analysis of application text data.
DBCV

1 140 0.0 Python

Python implementation of Density-Based Clustering Validation
n2d

1 122 0.0 Python

A deep clustering algorithm. Code to reproduce results for our paper N2D: (Not Too) Deep Clustering via Clustering the Local Manifold of an Autoencoded Embedding.
text-summarizer

1 113 0.0 Python

Python Framework for Extractive Text Summarization
hazelcast-python-client

4 112 6.7 Python

Hazelcast Python Client
relevanceai

1 101 6.4 Python

Home of the AI workforce - Multi-agent system, AI agents & tools
Revisiting-Contrastive-SSL

1 86 0.0 Python

Revisiting Contrastive Methods for Unsupervised Learning of Visual Representations. [NeurIPS 2021]
impfuzzy

1 82 0.0 Python

Fuzzy Hash calculated from import API of PE files
hironex

2 70 0.0 Python

HiRoNEx: Historical road network extractor: A python tool for automatic, fully unsupervised extraction of historical road networks from historical maps.
SaaSHub

www.saashub.com featured

SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020).

Python Clustering related posts

Llama V2 is free to try on the AI Horde

2 projects | news.ycombinator.com | 18 Jul 2023
How to run pygmalion: usefull links

3 projects | /r/PygmalionAI | 20 May 2023
Crooks’ Mistaken Bet on Encrypted Phones

1 project | news.ycombinator.com | 17 Apr 2023
ELI5 hardware keylogging

1 project | /r/explainlikeimfive | 11 Apr 2023
KoboldAI Lite now has Stable Horde integration for automatic inline image generation in stories.

1 project | /r/KoboldAI | 24 Jan 2023
Middle ground dataset between CIFAR and ImageNet [D]

1 project | /r/MachineLearning | 17 Sep 2022
KoboldAI 1.17 - The Great Migration

5 projects | /r/KoboldAI | 6 Feb 2022
A note from our sponsor - SaaSHub
www.saashub.com | 10 May 2024

SaaSHub helps you find the best software and product alternatives Learn more →

Index

What are some of the best open-source Clustering projects in Python? This list will help you:

	Project	Stars
1	orange	4,619
2	dedupe	3,984
3	awesome-community-detection	2,270
4	uis-rnn	1,533
5	mteb	1,421
6	minisom	1,388
7	Unsupervised-Classification	1,309
8	similarity	997
9	PyPOTS	683
10	Unsupervised-Semantic-Segmentation	386
11	matrixprofile	356
12	slot-attention	350
13	TEXTOIR	180
14	fuzzy-c-means	162
15	stringlifier	157
16	DBCV	140
17	n2d	122
18	text-summarizer	113
19	hazelcast-python-client	112
20	relevanceai	101
21	Revisiting-Contrastive-SSL	86
22	impfuzzy	82
23	hironex	70

Python Clustering

Top 23 Python Clustering Projects

Python Clustering related posts

Llama V2 is free to try on the AI Horde

How to run pygmalion: usefull links

Crooks’ Mistaken Bet on Encrypted Phones

ELI5 hardware keylogging

KoboldAI Lite now has Stable Horde integration for automatic inline image generation in stories.

Middle ground dataset between CIFAR and ImageNet [D]

KoboldAI 1.17 - The Great Migration

Index