SaaSHub helps you find the best software and product alternatives Learn more →
Top 23 Python Clustering Projects
🍊 :bar_chart: :bulb: Orange: Interactive data analysisProject mention: Why don't more people use Altair for python Visualizations instead of Plotly? | /r/datascience | 2023-05-23
You should also check out Orange Data Mining, it allows to create a lot of charts, filter data from a chart to another, build ML models, predictions and a lot more. And you can do it with zero code.
:id: A python library for accurate and scalable fuzzy matching, record deduplication and entity-resolution.Project mention: String distance based network for fuzzy matching? | /r/datascience | 2023-05-28
I think this problem is known as data deduplication, in particular, entity deduplication. I googled a bit and it seems approaches vary from manual deduplication to some sort of active learning (if I am not mistaken). I am also curios if pre-trained transformer-based cross encoders can provide any good results (they are trained on sentences I think, but may be worth a try). Another problem here is how to measure progress (compare different approaches)?
Write Clean Python Code. Always.. Sonar helps you commit clean code every time. With over 225 unique rules to find Python bugs, code smells & vulnerabilities, Sonar finds the issues while you focus on the work.
A curated list of community detection research papers with implementations.
This is the library for the Unbounded Interleaved-State Recurrent Neural Network (UIS-RNN) algorithm, corresponding to the paper Fully Supervised Speaker Diarization.Project mention: [D] Is there a way to distinguish different human voices from 1 audio file ? | /r/MachineLearning | 2022-10-03
Looks like you can get an put of the box here: https://github.com/google/uis-rnn
:red_circle: MiniSom is a minimalistic implementation of the Self Organizing Maps
SCAN: Learning to Classify Images without Labels, incl. SimCLR. [ECCV 2020]Project mention: Middle ground dataset between CIFAR and ImageNet [D] | /r/MachineLearning | 2022-09-17
The subsets we used are from here: https://github.com/wvangansbeke/Unsupervised-Classification/tree/master/data/imagenet_subsets
TensorFlow Similarity is a python package focused on making similarity learning quick and easy.Project mention: New free tool that uses fine-tuned BERT model to surface answers from research papers | /r/LanguageTechnology | 2022-10-28
Tensorflow Ranking and Tensorflow similarity (maybe relevant/irrelevant contrastive learning?) look like they could be useful.
Access the most powerful time series database as a service. Ingest, store, & analyze all types of time series data in a fully-managed, purpose-built database. Keep data forever with low-cost storage and superior data compression.
Unsupervised Semantic Segmentation by Contrasting Object Mask Proposals. [ICCV 2021]
MTEB: Massive Text Embedding BenchmarkProject mention: Text Embedding Benchmark (MTEB) Leaderboard | news.ycombinator.com | 2023-02-20
Implementation of Slot Attention from GoogleAIProject mention: Object-Centric Learning with Slot Attention | news.ycombinator.com | 2023-03-27
A Python 3 library making time series data mining tasks, utilizing matrix profile algorithms, accessible to everyone.
Stringlifier is on Opensource ML Library for detecting random strings in raw text. It can be used in sanitising logs, detecting accidentally exposed credentials and as a pre-processing step in unsupervised ML-based analysis of application text data.
A simple python implementation of Fuzzy C-means algorithm.
TEXTOIR is a flexible toolkit for open intent detection and discovery. (ACL 2021)Project mention: [D] Extracting next action from conversation | /r/MachineLearning | 2022-06-08
- Intent extraction models such as https://github.com/thuiar/textoir. My problem with this approach is that they are multi-label classifiers and usually focused on single-sentence classification "Can you get me a table?" would be assigned to the "Reservation" label. I feel that I would lose information such as "Meet at 10PM in this address."
A deep clustering algorithm. Code to reproduce results for the paper N2D: (Not Too) Deep Clustering via Clustering the Local Manifold of an Autoencoded Embedding.
Python Framework for Extractive Text Summarization
Hazelcast Python Client
Python implementation of Density-Based Clustering Validation
Revisiting Contrastive Methods for Unsupervised Learning of Visual Representations. [NeurIPS 2021]
Fuzzy Hash calculated from import API of PE files
Build and deploy AI chains & agents
An implementation of Chinese Whispers in Python.
Using machine learning on your anki collection to enhance the scheduling via semantic clustering and semantic similarity
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
Python Clustering related posts
How to run pygmalion: usefull links
3 projects | /r/PygmalionAI | 20 May 2023
Crooks’ Mistaken Bet on Encrypted Phones
1 project | news.ycombinator.com | 17 Apr 2023
ELI5 hardware keylogging
1 project | /r/explainlikeimfive | 11 Apr 2023
KoboldAI Lite now has Stable Horde integration for automatic inline image generation in stories.
1 project | /r/KoboldAI | 24 Jan 2023
Middle ground dataset between CIFAR and ImageNet [D]
1 project | /r/MachineLearning | 17 Sep 2022
KoboldAI 1.17 - The Great Migration
5 projects | /r/KoboldAI | 6 Feb 2022
Clustering and visualizing the MAL recommendation graph
3 projects | /r/AnimeResearch | 5 Jan 2022
A note from our sponsor - #<SponsorshipServiceOld:0x00007f0920a84078>
www.saashub.com | 7 Jun 2023
What are some of the best open-source Clustering projects in Python? This list will help you: