Python Clustering

Open-source Python projects categorized as Clustering | Edit details

Top 18 Python Clustering Projects

  • GitHub repo dedupe

    :id: A python library for accurate and scalable fuzzy matching, record deduplication and entity-resolution.

    Project mention: [OC] Media bias? US Sunday news shows book Republicans more than Democrats: Three of the five top Sunday news shows, altogether watched by almost 8 million people weekly, featured Republican partisans more often than Democrats in episodes aired this year through Oct. 31. | reddit.com/r/dataisbeautiful | 2021-11-04

    Tools used: Python to scrape guest lists, dedupeio to better identify guests, Google Sheets to store and analyze the data, and Datawrapper to make the charts.

  • GitHub repo orange

    🍊 :bar_chart: :bulb: Orange: Interactive data analysis

    Project mention: ETL Library for Python | reddit.com/r/Python | 2021-09-27

    "On the simpler side". Do you mean with a graphical interface? Then, orange would be a nice solution. https://orangedatamining.com/

  • SonarLint

    Deliver Cleaner and Safer Code - Right in Your IDE of Choice!. SonarLint is a free and open source IDE extension that identifies and catches bugs and vulnerabilities as you code, directly in the IDE. Install from your favorite IDE marketplace today.

  • GitHub repo uis-rnn

    This is the library for the Unbounded Interleaved-State Recurrent Neural Network (UIS-RNN) algorithm, corresponding to the paper Fully Supervised Speaker Diarization.

    Project mention: Putting my degree to use. (Exclude Specials and Guests) | reddit.com/r/TrashTaste | 2021-06-05

    Discussion: - When I started this, I thought I would use something like the VoxSort Diarization and it would be easy. But these apps are terrible, especially in recognizing Joey apart from Garnt. Connor has a distinct voice so it was recognizable but still bad. But I didn't think Joey's and Garnt's voices were so similar. - Tested the thing and it's accuracy is almost 99%. - You can still improve this by cutting the episode into smaller chunk but 1 second is the maximum for my computer, any smaller than that i will run out of RAM. I can work to get around this but hey I'm lazy. - The library to implement yourself from google.

  • GitHub repo minisom

    :red_circle: MiniSom is a minimalistic implementation of the Self Organizing Maps

    Project mention: [P][D] Self Organizing Maps | reddit.com/r/MachineLearning | 2021-07-15
  • GitHub repo Unsupervised-Classification

    SCAN: Learning to Classify Images without Labels, incl. SimCLR. [ECCV 2020]

    Project mention: Any reference or idea about how to train unsupervised CNN model ? | reddit.com/r/deeplearning | 2021-04-13
  • GitHub repo Unsupervised-Semantic-Segmentation

    Unsupervised Semantic Segmentation by Contrasting Object Mask Proposals. [ICCV 2021]

    Project mention: Unsupervised semantic segmentation | reddit.com/r/MLQuestions | 2021-09-09

    Check out these unsupervised masks created in exactly such way in this paper. They are nearly perfect

  • GitHub repo stringlifier

    Stringlifier is on Opensource ML Library for detecting random strings in raw text. It can be used in sanitising logs, detecting accidentally exposed credentials and as a pre-processing step in unsupervised ML-based analysis of application text data.

    Project mention: Stringlifier: ML Library for detecting random strings in raw text | news.ycombinator.com | 2021-07-23
  • Scout APM

    Less time debugging, more time building. Scout APM allows you to find and fix performance issues with no hassle. Now with error monitoring and external services monitoring, Scout is a developer's best friend when it comes to application development.

  • GitHub repo n2d

    A deep clustering algorithm. Code to reproduce results for the paper N2D: (Not Too) Deep Clustering via Clustering the Local Manifold of an Autoencoded Embedding.

    Project mention: Time-Series image clustering. Advice needed! | reddit.com/r/gis | 2021-09-12

    So far I found some approaches that looks promising, for example n2d or k-means with DTW distance, and there are some more (e.g. T-DPSOM), but I want to start from these.

  • GitHub repo hazelcast-python-client

    Hazelcast Python Client

    Project mention: Contribution to Hazelcast | reddit.com/r/Python | 2021-07-05

    More code samples here: https://github.com/hazelcast/hazelcast-python-client/tree/master/examples

  • GitHub repo Revisiting-Contrastive-SSL

    Revisiting Contrastive Methods for Unsupervised Learning of Visual Representations. [NeurIPS 2021]

    Project mention: [R] Contrastive Visual Representation Learning Is More Robust Than You Might Think (Paper + Analysis) | reddit.com/r/MachineLearning | 2021-06-17
  • GitHub repo acoustic-keylogger

    Pipeline of a keylogging attack using just an audio signal and unsupervised learning.

  • GitHub repo impfuzzy

    Fuzzy Hash calculated from import API of PE files

    Project mention: Where do you get old versions of Visual C++? | reddit.com/r/learnpython | 2021-04-25

    I want to use this plugin so I try to install its required module, which fails (whether using 'pip' or 'setup.py') for the same reason as distorm3. The installation of ssdeep as required by pyimpfuzzy also fails, but for a different reason that I haven't started looking into yet.

  • GitHub repo chinese-whispers-python

    An implementation of Chinese Whispers in Python.

    Project mention: Clustering Algorithms with Python | news.ycombinator.com | 2021-08-29

    As this tutorial introduces the spectral clustering method that uses a similarity matrix between objects, I believe graph clustering methods are also worth mentioning. When I was investigating this topic, I released an implementation of a very efficient randomized clustering algorithm for graphs called Chinese Whispers: https://github.com/nlpub/chinese-whispers-python. Since it does not use matrices internally, it allows handling very large NetworkX graphs.

  • GitHub repo AnnA_Anki_neuronal_Appendix

    using machine learning on anki collection for: enhanced scheduling, semantic plotting, semantic clustering, searching by semantic similarity

    Project mention: Revolution? Using machine learning to handle backlog, introducing AnnA: Anki neural network appendix | reddit.com/r/Anki | 2021-09-24

    ping /u/AnKingMed /u/Glutanimate Here's the link: https://github.com/thiswillbeyourgithub/AnnA_Anki_neuronal_Appendix

  • GitHub repo CAC

    A Clustering Based Classification Algorithm

    Project mention: Show HN: Don't Just Divide; Polarize and Conquer | news.ycombinator.com | 2021-06-23
  • GitHub repo simple_keyword_clusterer

    A simple machine learning package to cluster keywords in higher-level groups.

    Project mention: I published my first open source project: the Simple Keyword Clusterer. Python package to cluster keywords in higher-level groups | reddit.com/r/programming | 2021-08-31
  • GitHub repo pyDenStream

    Implementation of the DenStream algorithm in Python.

    Project mention: [P] Implementation of DenStream | reddit.com/r/MachineLearning | 2021-03-28

    The implementation can be found here: https://github.com/MrParosk/pyDenStream

  • GitHub repo woodKubernetes

    LXD wood cluster

    Project mention: GitHub - Ne00n/woodKubernetes: LXD wood cluster | reddit.com/r/LXD | 2021-08-06
NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020). The latest post mention was on 2021-11-04.

Python Clustering related posts

Index

What are some of the best open-source Clustering projects in Python? This list will help you:

Project Stars
1 dedupe 3,250
2 orange 3,211
3 uis-rnn 1,387
4 minisom 990
5 Unsupervised-Classification 896
6 Unsupervised-Semantic-Segmentation 231
7 stringlifier 132
8 n2d 101
9 hazelcast-python-client 97
10 Revisiting-Contrastive-SSL 67
11 acoustic-keylogger 67
12 impfuzzy 65
13 chinese-whispers-python 51
14 AnnA_Anki_neuronal_Appendix 31
15 CAC 26
16 simple_keyword_clusterer 5
17 pyDenStream 4
18 woodKubernetes 1
Find remote jobs at our new job board 99remotejobs.com. There are 29 new remote jobs listed recently.
Are you hiring? Post a new remote job listing for free.
OPS - Build and Run Open Source Unikernels
Quickly and easily build and deploy open source unikernels in tens of seconds. Deploy in any language to any cloud.
github.com/nanovms