Clustering

Top 23 Clustering Open-Source Projects

  • pycaret

    An open-source, low-code machine learning library in Python

  • Smile

    Statistical Machine Intelligence & Learning Engine

  • Project mention: The Current State of Clojure's Machine Learning Ecosystem | news.ycombinator.com | 2024-04-07

    > I don't think it's right to recommend that new users move away from the package because of licensing issues

    I was going to chime in to agree but then I saw how this was done - a completely innocuous looking commit:

    https://github.com/haifengl/smile/commit/6f22097b233a3436519...

    And literally no mention in the release notes:

    https://github.com/haifengl/smile/releases/tag/v3.0.0

    I think if you are going to change license especially in a way that makes it less permissive you need to be super open and clear about both the fact you are doing it and your reasons for that. This is done so silently as to look like it is intentionally trying to mislead and trick people.

    So maybe I wouldn't say to move away because of the specific license, but it's legitimate to avoid something when it's so clearly driven by a single entity and that entity acts in a way that isn't trustworthy.

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • protoactor-go

    Proto Actor - Ultra fast distributed actors for Go, C# and Java/Kotlin

  • Project mention: Is there a programming language that will blow my mind? | /r/ProgrammingLanguages | 2023-06-01

    https://github.com/asynkron/protoactor-go & this is a great lib, that implements a Erlang/Akka-like the Actor Model in Go.

  • orange

    🍊 :bar_chart: :bulb: Orange: Interactive data analysis

  • Project mention: Hierarchical Clustering | news.ycombinator.com | 2024-04-20

    I know I've tooted its horn before, but Orange3 is a pretty neat Python-based GUI platform that makes this and a metric buttload of other statistical/ML techniques available to non-programmer types.

    Just watch out for null character `x00` in the corpus. That always seems to kill it stone dead.

    https://orangedatamining.com/

    https://orange3.readthedocs.io/projects/orange-visual-progra...

  • dedupe

    :id: A python library for accurate and scalable fuzzy matching, record deduplication and entity-resolution.

  • Project mention: Using deep learning for Fuzzy Matching | /r/datascience | 2023-07-06
  • Leaflet.markercluster

    Marker Clustering plugin for Leaflet

  • Project mention: 🌲Svelte + πŸƒLeaflet + πŸ“ Clusters | dev.to | 2023-09-24

    You can retrieve this plugin from its official repo on Github : Leaflet.markercluster.

  • machine-learning-articles

    πŸ§ πŸ’¬ Articles I wrote about machine learning, archived from MachineCurve.com.

  • SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

    SaaSHub logo
  • awesome-single-cell

    Community-curated list of software packages and data resources for single-cell, including RNA-seq, ATAC-seq, etc.

  • hdbscan

    A high performance implementation of HDBSCAN clustering.

  • awesome-community-detection

    A curated list of community detection research papers with implementations.

  • supercluster

    A very fast geospatial point clustering library for browsers and Node.

  • RubixML

    A high-level machine learning and deep learning library for the PHP language.

  • Project mention: Machine learning and deep learning library for the PHP language | news.ycombinator.com | 2023-11-04
  • libcluster

    Automatic cluster formation/healing for Elixir applications

  • Project mention: Elixir clustering using Postgres | dev.to | 2024-01-25

    libcluster is the go-to package for connecting multiple BEAM instances and setting up healing strategies. libcluster provides out-of-the-box strategies and it allows users to define their own strategies by implementing a simple behavior that defines cluster formation and healing according to the supporting service you want to use.

  • bottleneck

    Job scheduler and rate limiter, supports Clustering

  • Project mention: How can i improve my web scraper to be less abusive to the website. | /r/node | 2023-06-30
  • MLJ.jl

    A Julia machine learning framework

  • protoactor-dotnet

    Proto Actor - Ultra fast distributed actors for Go, C# and Java/Kotlin

  • usearch

    Fast Open-Source Search & Clustering engine Γ— for Vectors & πŸ”œ Strings Γ— in C++, C, Python, JavaScript, Rust, Java, Objective-C, Swift, C#, GoLang, and Wolfram πŸ”

  • Project mention: USearch SQLite Extensions for Vector and Text Search | news.ycombinator.com | 2024-02-22
  • MooseFS

    MooseFS – Open Source, Petabyte, Fault-Tolerant, Highly Performing, Scalable Network Distributed File System (Software-Defined Storage)

  • Project mention: Google Cloud Storage FUSE | news.ycombinator.com | 2023-05-02
  • uis-rnn

    This is the library for the Unbounded Interleaved-State Recurrent Neural Network (UIS-RNN) algorithm, corresponding to the paper Fully Supervised Speaker Diarization.

  • minisom

    :red_circle: MiniSom is a minimalistic implementation of the Self Organizing Maps

  • mteb

    MTEB: Massive Text Embedding Benchmark

  • Project mention: AI for AWS Documentation | news.ycombinator.com | 2023-07-06

    RAG is very difficult to do right. I am experimenting with various RAG projects from [1]. The main problems are:

    - Chunking can interfer with context boundaries

    - Content vectors can differ vastly from question vectors, for this you have to use hypothetical embeddings (they generate artificial questions and store them)

    - Instead of saving just one embedding per text-chuck you should store various (text chunk, hypothetical embedding questions, meta data)

    - RAG will miserably fail with requests like "summarize the whole document"

    - to my knowledge, openAI embeddings aren't performing well, use a embedding that is optimized for question answering or information retrieval and supports multi language. Also look into instructor embeddings: https://github.com/embeddings-benchmark/mteb

    1 https://github.com/underlines/awesome-marketing-datascience/...

  • Unsupervised-Classification

    SCAN: Learning to Classify Images without Labels, incl. SimCLR. [ECCV 2020]

  • Cluster

    Easy Map Annotation Clustering πŸ“

  • SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

    SaaSHub logo
NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020).

Clustering related posts

  • Elixir clustering using Postgres

    4 projects | dev.to | 25 Jan 2024
  • NATS by Example - Examples of how to use NATS and JetStream in various languages

    1 project | /r/NATS_io | 2 Nov 2023
  • Phoenix 1.7 for Elixir: Edit a Form in a Modal

    1 project | news.ycombinator.com | 12 Sep 2023
  • Elixir for Ruby developers: the three most important differences

    5 projects | news.ycombinator.com | 23 Jul 2023
  • Llama V2 is free to try on the AI Horde

    2 projects | news.ycombinator.com | 18 Jul 2023
  • Recommendations to get me back into anime?

    1 project | /r/anime | 4 Jul 2023
  • Want something better than k-means? Try BanditPAM (github.com/motiwari)

    1 project | /r/linux | 27 Jun 2023
  • A note from our sponsor - InfluxDB
    www.influxdata.com | 1 May 2024
    Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality. Learn more β†’

Index

What are some of the best open-source Clustering projects? This list will help you:

Project Stars
1 pycaret 8,406
2 Smile 5,924
3 protoactor-go 4,877
4 orange 4,611
5 dedupe 3,979
6 Leaflet.markercluster 3,854
7 machine-learning-articles 3,093
8 awesome-single-cell 2,907
9 hdbscan 2,672
10 awesome-community-detection 2,266
11 supercluster 2,018
12 RubixML 1,975
13 libcluster 1,886
14 bottleneck 1,739
15 MLJ.jl 1,722
16 protoactor-dotnet 1,661
17 usearch 1,647
18 MooseFS 1,584
19 uis-rnn 1,530
20 minisom 1,387
21 mteb 1,395
22 Unsupervised-Classification 1,306
23 Cluster 1,262

Sponsored
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com