Python Clustering

Open-source Python projects categorized as Clustering

Top 23 Python Clustering Projects

  • orange

    🍊 :bar_chart: :bulb: Orange: Interactive data analysis

    Project mention: Why don't more people use Altair for python Visualizations instead of Plotly? | /r/datascience | 2023-05-23

    You should also check out Orange Data Mining, it allows to create a lot of charts, filter data from a chart to another, build ML models, predictions and a lot more. And you can do it with zero code.

  • dedupe

    :id: A python library for accurate and scalable fuzzy matching, record deduplication and entity-resolution.

    Project mention: String distance based network for fuzzy matching? | /r/datascience | 2023-05-28

    I think this problem is known as data deduplication, in particular, entity deduplication. I googled a bit and it seems approaches vary from manual deduplication to some sort of active learning (if I am not mistaken). I am also curios if pre-trained transformer-based cross encoders can provide any good results (they are trained on sentences I think, but may be worth a try). Another problem here is how to measure progress (compare different approaches)?

  • Sonar

    Write Clean Python Code. Always.. Sonar helps you commit clean code every time. With over 225 unique rules to find Python bugs, code smells & vulnerabilities, Sonar finds the issues while you focus on the work.

  • awesome-community-detection

    A curated list of community detection research papers with implementations.

  • uis-rnn

    This is the library for the Unbounded Interleaved-State Recurrent Neural Network (UIS-RNN) algorithm, corresponding to the paper Fully Supervised Speaker Diarization.

    Project mention: [D] Is there a way to distinguish different human voices from 1 audio file ? | /r/MachineLearning | 2022-10-03

    Looks like you can get an put of the box here:

  • minisom

    :red_circle: MiniSom is a minimalistic implementation of the Self Organizing Maps

  • Unsupervised-Classification

    SCAN: Learning to Classify Images without Labels, incl. SimCLR. [ECCV 2020]

    Project mention: Middle ground dataset between CIFAR and ImageNet [D] | /r/MachineLearning | 2022-09-17

    The subsets we used are from here:

  • similarity

    TensorFlow Similarity is a python package focused on making similarity learning quick and easy.

    Project mention: New free tool that uses fine-tuned BERT model to surface answers from research papers | /r/LanguageTechnology | 2022-10-28

    Tensorflow Ranking and Tensorflow similarity (maybe relevant/irrelevant contrastive learning?) look like they could be useful.

  • InfluxDB

    Access the most powerful time series database as a service. Ingest, store, & analyze all types of time series data in a fully-managed, purpose-built database. Keep data forever with low-cost storage and superior data compression.

  • Unsupervised-Semantic-Segmentation

    Unsupervised Semantic Segmentation by Contrasting Object Mask Proposals. [ICCV 2021]

  • mteb

    MTEB: Massive Text Embedding Benchmark

    Project mention: Text Embedding Benchmark (MTEB) Leaderboard | | 2023-02-20
  • slot-attention

    Implementation of Slot Attention from GoogleAI

    Project mention: Object-Centric Learning with Slot Attention | | 2023-03-27
  • matrixprofile

    A Python 3 library making time series data mining tasks, utilizing matrix profile algorithms, accessible to everyone.

  • stringlifier

    Stringlifier is on Opensource ML Library for detecting random strings in raw text. It can be used in sanitising logs, detecting accidentally exposed credentials and as a pre-processing step in unsupervised ML-based analysis of application text data.

  • fuzzy-c-means

    A simple python implementation of Fuzzy C-means algorithm.


    TEXTOIR is a flexible toolkit for open intent detection and discovery. (ACL 2021)

    Project mention: [D] Extracting next action from conversation | /r/MachineLearning | 2022-06-08

    - Intent extraction models such as My problem with this approach is that they are multi-label classifiers and usually focused on single-sentence classification "Can you get me a table?" would be assigned to the "Reservation" label. I feel that I would lose information such as "Meet at 10PM in this address."

  • n2d

    A deep clustering algorithm. Code to reproduce results for the paper N2D: (Not Too) Deep Clustering via Clustering the Local Manifold of an Autoencoded Embedding.

  • text-summarizer

    Python Framework for Extractive Text Summarization

  • hazelcast-python-client

    Hazelcast Python Client

  • DBCV

    Python implementation of Density-Based Clustering Validation

  • Revisiting-Contrastive-SSL

    Revisiting Contrastive Methods for Unsupervised Learning of Visual Representations. [NeurIPS 2021]

  • impfuzzy

    Fuzzy Hash calculated from import API of PE files

  • relevanceai

    Build and deploy AI chains & agents

  • chinese-whispers-python

    An implementation of Chinese Whispers in Python.

  • AnnA_Anki_neuronal_Appendix

    Using machine learning on your anki collection to enhance the scheduling via semantic clustering and semantic similarity

  • SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020). The latest post mention was on 2023-05-28.

Python Clustering related posts


What are some of the best open-source Clustering projects in Python? This list will help you:

Project Stars
1 orange 4,125
2 dedupe 3,737
3 awesome-community-detection 2,143
4 uis-rnn 1,462
5 minisom 1,246
6 Unsupervised-Classification 1,194
7 similarity 948
8 Unsupervised-Semantic-Segmentation 357
9 mteb 330
10 slot-attention 313
11 matrixprofile 312
12 stringlifier 145
13 fuzzy-c-means 143
14 TEXTOIR 132
15 n2d 116
16 text-summarizer 112
17 hazelcast-python-client 106
18 DBCV 100
19 Revisiting-Contrastive-SSL 82
20 impfuzzy 79
21 relevanceai 58
22 chinese-whispers-python 54
23 AnnA_Anki_neuronal_Appendix 48
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives