Python Clustering

Open-source Python projects categorized as Clustering

Top 23 Python Clustering Projects

  • orange

    🍊 :bar_chart: :bulb: Orange: Interactive data analysis

    Project mention: Hierarchical Clustering | news.ycombinator.com | 2024-04-20

    I know I've tooted its horn before, but Orange3 is a pretty neat Python-based GUI platform that makes this and a metric buttload of other statistical/ML techniques available to non-programmer types.

    Just watch out for null character `x00` in the corpus. That always seems to kill it stone dead.

    https://orangedatamining.com/

    https://orange3.readthedocs.io/projects/orange-visual-progra...

  • SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

    SaaSHub logo
  • dedupe

    :id: A python library for accurate and scalable fuzzy matching, record deduplication and entity-resolution.

    Project mention: 'Groups' Underpin Modern Math | news.ycombinator.com | 2024-09-07

    For sure. The problem was writing an np.unique[1] that could handle large datasets. Specifically, the solution involved chunking the dataset, mapping np.unique across chunks, and then combining chunks. Merging the counts result is an associative operation and merging them in a tree-like computational graph implies O(log n) merges.

    Specifically this is work related to implementing large dataset support for the dedupe library[1]. It's valuable to be able to effectively de-duplicate messy datasets. That's about as much as I can share.

    1. https://numpy.org/doc/stable/reference/generated/numpy.uniqu...

    2. https://github.com/dedupeio/dedupe/blob/main/dedupe/clusteri...

  • awesome-community-detection

    A curated list of community detection research papers with implementations.

  • uis-rnn

    This is the library for the Unbounded Interleaved-State Recurrent Neural Network (UIS-RNN) algorithm, corresponding to the paper Fully Supervised Speaker Diarization.

  • minisom

    :red_circle: MiniSom is a minimalistic implementation of the Self Organizing Maps

  • Unsupervised-Classification

    SCAN: Learning to Classify Images without Labels, incl. SimCLR. [ECCV 2020]

  • PyPOTS

    A Python toolkit/library for reality-centric machine/deep learning and data mining on partially-observed time series, including SOTA neural network models for scientific analysis tasks of imputation/classification/clustering/forecasting/anomaly detection/cleaning on incomplete industrial (irregularly-sampled) multivariate TS with NaN missing values

  • uform

    Pocket-Sized Multimodal AI for content understanding and generation across multilingual texts, images, and 🔜 video, up to 5x faster than OpenAI CLIP and LLaVA 🖼️ & 🖋️

    Project mention: Recapping the AI, Machine Learning and Data Science Meetup - May 30, 2024 | dev.to | 2024-06-04

    UForm: Pocket-Sized Multimodal AI for Content Understanding and Generation

  • similarity

    TensorFlow Similarity is a python package focused on making similarity learning quick and easy.

  • slot-attention

    Implementation of Slot Attention from GoogleAI

  • Unsupervised-Semantic-Segmentation

    Unsupervised Semantic Segmentation by Contrasting Object Mask Proposals. [ICCV 2021]

  • matrixprofile

    A Python 3 library making time series data mining tasks, utilizing matrix profile algorithms, accessible to everyone.

  • TEXTOIR

    TEXTOIR is the first opensource toolkit for text open intent recognition. (ACL 2021)

  • fuzzy-c-means

    A simple python implementation of Fuzzy C-means algorithm.

  • relevanceai

    Home of the AI workforce - Multi-agent system, AI agents & tools

  • stringlifier

    Stringlifier is on Opensource ML Library for detecting random strings in raw text. It can be used in sanitising logs, detecting accidentally exposed credentials and as a pre-processing step in unsupervised ML-based analysis of application text data.

  • DBCV

    Python implementation of Density-Based Clustering Validation

  • n2d

    A deep clustering algorithm. Code to reproduce results for our paper N2D: (Not Too) Deep Clustering via Clustering the Local Manifold of an Autoencoded Embedding.

  • text-summarizer

    Python Framework for Extractive Text Summarization

  • hazelcast-python-client

    Hazelcast Python Client

  • apple-ocr

    Easy-to-Use Apple Vision wrapper for text extraction, scalar representation and clustering using K-means.

    Project mention: Easy-to-Use Apple Vision wrapper for text extraction and clustering | news.ycombinator.com | 2024-01-28

    The most interesting piece of code here IMO is the recognize() method, which demonstrates how to call Vision.VNImageRequestHandler and Vision.VNRecognizeTextRequest from Python code using pyobjc.

    https://github.com/louisbrulenaudet/apple-ocr/blob/70c25b24b...

  • impfuzzy

    Fuzzy Hash calculated from import API of PE files

  • Revisiting-Contrastive-SSL

    Revisiting Contrastive Methods for Unsupervised Learning of Visual Representations. [NeurIPS 2021]

NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020).

Python Clustering discussion

Log in or Post with

Python Clustering related posts

Index

What are some of the best open-source Clustering projects in Python? This list will help you:

Project Stars
1 orange 4,892
2 dedupe 4,160
3 awesome-community-detection 2,342
4 uis-rnn 1,563
5 minisom 1,462
6 Unsupervised-Classification 1,366
7 PyPOTS 1,135
8 uform 1,059
9 similarity 1,013
10 slot-attention 397
11 Unsupervised-Semantic-Segmentation 397
12 matrixprofile 362
13 TEXTOIR 202
14 fuzzy-c-means 176
15 relevanceai 166
16 stringlifier 164
17 DBCV 156
18 n2d 127
19 text-summarizer 113
20 hazelcast-python-client 111
21 apple-ocr 89
22 impfuzzy 87
23 Revisiting-Contrastive-SSL 86

Sponsored
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com

Did you konow that Python is
the 2nd most popular programming language
based on number of metions?