Python Clustering

Open-source Python projects categorized as Clustering

Top 23 Python Clustering Projects

  • orange

    🍊 :bar_chart: :bulb: Orange: Interactive data analysis

  • Project mention: Hierarchical Clustering | news.ycombinator.com | 2024-04-20

    I know I've tooted its horn before, but Orange3 is a pretty neat Python-based GUI platform that makes this and a metric buttload of other statistical/ML techniques available to non-programmer types.

    Just watch out for null character `x00` in the corpus. That always seems to kill it stone dead.

    https://orangedatamining.com/

    https://orange3.readthedocs.io/projects/orange-visual-progra...

  • dedupe

    :id: A python library for accurate and scalable fuzzy matching, record deduplication and entity-resolution.

  • Project mention: Using deep learning for Fuzzy Matching | /r/datascience | 2023-07-06
  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • awesome-community-detection

    A curated list of community detection research papers with implementations.

  • uis-rnn

    This is the library for the Unbounded Interleaved-State Recurrent Neural Network (UIS-RNN) algorithm, corresponding to the paper Fully Supervised Speaker Diarization.

  • mteb

    MTEB: Massive Text Embedding Benchmark

  • Project mention: AI for AWS Documentation | news.ycombinator.com | 2023-07-06

    RAG is very difficult to do right. I am experimenting with various RAG projects from [1]. The main problems are:

    - Chunking can interfer with context boundaries

    - Content vectors can differ vastly from question vectors, for this you have to use hypothetical embeddings (they generate artificial questions and store them)

    - Instead of saving just one embedding per text-chuck you should store various (text chunk, hypothetical embedding questions, meta data)

    - RAG will miserably fail with requests like "summarize the whole document"

    - to my knowledge, openAI embeddings aren't performing well, use a embedding that is optimized for question answering or information retrieval and supports multi language. Also look into instructor embeddings: https://github.com/embeddings-benchmark/mteb

    1 https://github.com/underlines/awesome-marketing-datascience/...

  • minisom

    :red_circle: MiniSom is a minimalistic implementation of the Self Organizing Maps

  • Unsupervised-Classification

    SCAN: Learning to Classify Images without Labels, incl. SimCLR. [ECCV 2020]

  • SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

    SaaSHub logo
  • similarity

    TensorFlow Similarity is a python package focused on making similarity learning quick and easy.

  • PyPOTS

    A Python toolbox/library for reality-centric machine/deep learning and data mining on partially-observed time series with PyTorch, including SOTA neural network models for science analysis tasks of imputation, classification, clustering, forecasting & anomaly detection on incomplete (irregularly-sampled) multivariate TS with NaN missing values

  • Project mention: Missing values in time series collected from the real world are common to see and very pesky. A new state-of-the-art and fast neural network called SAITS is proposed to impute missing data in partially-observed multivariate time series. The code is open source on GitHub. | /r/datascience | 2023-06-28

    Oh, wow, thanks for sharing it here! PyPOTS still has a long way to go, and I'm making it better. If you have any suggestions for PyPOTS, please let me know. Your feedback is always welcome and means a lot to the community of PyPOTS! If you like PyPOTS, please star 🌟 PyPOTS repo on GitHub and share it with people you know who may need it to help others notice this helpful work. Thank you very much!

  • Unsupervised-Semantic-Segmentation

    Unsupervised Semantic Segmentation by Contrasting Object Mask Proposals. [ICCV 2021]

  • matrixprofile

    A Python 3 library making time series data mining tasks, utilizing matrix profile algorithms, accessible to everyone.

  • slot-attention

    Implementation of Slot Attention from GoogleAI

  • TEXTOIR

    TEXTOIR is the first opensource toolkit for text open intent recognition. (ACL 2021)

  • fuzzy-c-means

    A simple python implementation of Fuzzy C-means algorithm.

  • stringlifier

    Stringlifier is on Opensource ML Library for detecting random strings in raw text. It can be used in sanitising logs, detecting accidentally exposed credentials and as a pre-processing step in unsupervised ML-based analysis of application text data.

  • DBCV

    Python implementation of Density-Based Clustering Validation

  • n2d

    A deep clustering algorithm. Code to reproduce results for our paper N2D: (Not Too) Deep Clustering via Clustering the Local Manifold of an Autoencoded Embedding.

  • text-summarizer

    Python Framework for Extractive Text Summarization

  • hazelcast-python-client

    Hazelcast Python Client

  • relevanceai

    Home of the AI workforce - Multi-agent system, AI agents & tools

  • Revisiting-Contrastive-SSL

    Revisiting Contrastive Methods for Unsupervised Learning of Visual Representations. [NeurIPS 2021]

  • impfuzzy

    Fuzzy Hash calculated from import API of PE files

  • hironex

    HiRoNEx: Historical road network extractor: A python tool for automatic, fully unsupervised extraction of historical road networks from historical maps.

  • SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

    SaaSHub logo
NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020).

Python Clustering related posts

Index

What are some of the best open-source Clustering projects in Python? This list will help you:

Project Stars
1 orange 4,619
2 dedupe 3,984
3 awesome-community-detection 2,270
4 uis-rnn 1,533
5 mteb 1,421
6 minisom 1,388
7 Unsupervised-Classification 1,309
8 similarity 997
9 PyPOTS 683
10 Unsupervised-Semantic-Segmentation 386
11 matrixprofile 356
12 slot-attention 350
13 TEXTOIR 180
14 fuzzy-c-means 162
15 stringlifier 157
16 DBCV 140
17 n2d 122
18 text-summarizer 113
19 hazelcast-python-client 112
20 relevanceai 101
21 Revisiting-Contrastive-SSL 86
22 impfuzzy 82
23 hironex 70

Sponsored
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com