SaaSHub helps you find the best software and product alternatives Learn more →
Top 23 Python Clustering Projects
-
dedupe
:id: A python library for accurate and scalable fuzzy matching, record deduplication and entity-resolution.
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
-
awesome-community-detection
A curated list of community detection research papers with implementations.
-
uis-rnn
This is the library for the Unbounded Interleaved-State Recurrent Neural Network (UIS-RNN) algorithm, corresponding to the paper Fully Supervised Speaker Diarization.
-
Unsupervised-Classification
SCAN: Learning to Classify Images without Labels, incl. SimCLR. [ECCV 2020]
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
-
similarity
TensorFlow Similarity is a python package focused on making similarity learning quick and easy.
-
PyPOTS
A Python toolbox/library for reality-centric machine/deep learning and data mining on partially-observed time series with PyTorch, including SOTA neural network models for science analysis tasks of imputation, classification, clustering, forecasting & anomaly detection on incomplete (irregularly-sampled) multivariate TS with NaN missing values
-
Unsupervised-Semantic-Segmentation
Unsupervised Semantic Segmentation by Contrasting Object Mask Proposals. [ICCV 2021]
-
matrixprofile
A Python 3 library making time series data mining tasks, utilizing matrix profile algorithms, accessible to everyone.
-
stringlifier
Stringlifier is on Opensource ML Library for detecting random strings in raw text. It can be used in sanitising logs, detecting accidentally exposed credentials and as a pre-processing step in unsupervised ML-based analysis of application text data.
-
n2d
A deep clustering algorithm. Code to reproduce results for our paper N2D: (Not Too) Deep Clustering via Clustering the Local Manifold of an Autoencoded Embedding.
-
Revisiting-Contrastive-SSL
Revisiting Contrastive Methods for Unsupervised Learning of Visual Representations. [NeurIPS 2021]
-
hironex
HiRoNEx: Historical road network extractor: A python tool for automatic, fully unsupervised extraction of historical road networks from historical maps.
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
I know I've tooted its horn before, but Orange3 is a pretty neat Python-based GUI platform that makes this and a metric buttload of other statistical/ML techniques available to non-programmer types.
Just watch out for null character `x00` in the corpus. That always seems to kill it stone dead.
https://orangedatamining.com/
https://orange3.readthedocs.io/projects/orange-visual-progra...
RAG is very difficult to do right. I am experimenting with various RAG projects from [1]. The main problems are:
- Chunking can interfer with context boundaries
- Content vectors can differ vastly from question vectors, for this you have to use hypothetical embeddings (they generate artificial questions and store them)
- Instead of saving just one embedding per text-chuck you should store various (text chunk, hypothetical embedding questions, meta data)
- RAG will miserably fail with requests like "summarize the whole document"
- to my knowledge, openAI embeddings aren't performing well, use a embedding that is optimized for question answering or information retrieval and supports multi language. Also look into instructor embeddings: https://github.com/embeddings-benchmark/mteb
1 https://github.com/underlines/awesome-marketing-datascience/...
Project mention: Missing values in time series collected from the real world are common to see and very pesky. A new state-of-the-art and fast neural network called SAITS is proposed to impute missing data in partially-observed multivariate time series. The code is open source on GitHub. | /r/datascience | 2023-06-28Oh, wow, thanks for sharing it here! PyPOTS still has a long way to go, and I'm making it better. If you have any suggestions for PyPOTS, please let me know. Your feedback is always welcome and means a lot to the community of PyPOTS! If you like PyPOTS, please star 🌟 PyPOTS repo on GitHub and share it with people you know who may need it to help others notice this helpful work. Thank you very much!
Python Clustering related posts
-
Llama V2 is free to try on the AI Horde
-
How to run pygmalion: usefull links
-
Crooks’ Mistaken Bet on Encrypted Phones
-
ELI5 hardware keylogging
-
KoboldAI Lite now has Stable Horde integration for automatic inline image generation in stories.
-
Middle ground dataset between CIFAR and ImageNet [D]
-
KoboldAI 1.17 - The Great Migration
-
A note from our sponsor - SaaSHub
www.saashub.com | 10 May 2024
Index
What are some of the best open-source Clustering projects in Python? This list will help you:
Project | Stars | |
---|---|---|
1 | orange | 4,619 |
2 | dedupe | 3,984 |
3 | awesome-community-detection | 2,270 |
4 | uis-rnn | 1,533 |
5 | mteb | 1,421 |
6 | minisom | 1,388 |
7 | Unsupervised-Classification | 1,309 |
8 | similarity | 997 |
9 | PyPOTS | 683 |
10 | Unsupervised-Semantic-Segmentation | 386 |
11 | matrixprofile | 356 |
12 | slot-attention | 350 |
13 | TEXTOIR | 180 |
14 | fuzzy-c-means | 162 |
15 | stringlifier | 157 |
16 | DBCV | 140 |
17 | n2d | 122 |
18 | text-summarizer | 113 |
19 | hazelcast-python-client | 112 |
20 | relevanceai | 101 |
21 | Revisiting-Contrastive-SSL | 86 |
22 | impfuzzy | 82 |
23 | hironex | 70 |
Sponsored