Clustering

Top 23 Clustering Open-Source Projects

  • pycaret

    An open-source, low-code machine learning library in Python

  • Smile

    Statistical Machine Intelligence & Learning Engine

  • Project mention: The Current State of Clojure's Machine Learning Ecosystem | news.ycombinator.com | 2024-04-07

    > I don't think it's right to recommend that new users move away from the package because of licensing issues

    I was going to chime in to agree but then I saw how this was done - a completely innocuous looking commit:

    https://github.com/haifengl/smile/commit/6f22097b233a3436519...

    And literally no mention in the release notes:

    https://github.com/haifengl/smile/releases/tag/v3.0.0

    I think if you are going to change license especially in a way that makes it less permissive you need to be super open and clear about both the fact you are doing it and your reasons for that. This is done so silently as to look like it is intentionally trying to mislead and trick people.

    So maybe I wouldn't say to move away because of the specific license, but it's legitimate to avoid something when it's so clearly driven by a single entity and that entity acts in a way that isn't trustworthy.

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • postgresml

    The GPU-powered AI application database. Get your app to market faster using the simplicity of SQL and the latest NLP, ML + LLM models.

  • Project mention: PostgresML | /r/programming | 2023-08-30
  • protoactor-go

    Proto Actor - Ultra fast distributed actors for Go, C# and Java/Kotlin

  • Project mention: Is there a programming language that will blow my mind? | /r/ProgrammingLanguages | 2023-06-01

    https://github.com/asynkron/protoactor-go & this is a great lib, that implements a Erlang/Akka-like the Actor Model in Go.

  • orange

    🍊 :bar_chart: :bulb: Orange: Interactive data analysis

  • Project mention: Hierarchical Clustering | news.ycombinator.com | 2024-04-20

    I know I've tooted its horn before, but Orange3 is a pretty neat Python-based GUI platform that makes this and a metric buttload of other statistical/ML techniques available to non-programmer types.

    Just watch out for null character `x00` in the corpus. That always seems to kill it stone dead.

    https://orangedatamining.com/

    https://orange3.readthedocs.io/projects/orange-visual-progra...

  • dedupe

    :id: A python library for accurate and scalable fuzzy matching, record deduplication and entity-resolution.

  • Project mention: Using deep learning for Fuzzy Matching | /r/datascience | 2023-07-06
  • Leaflet.markercluster

    Marker Clustering plugin for Leaflet

  • Project mention: 🌲Svelte + πŸƒLeaflet + πŸ“ Clusters | dev.to | 2023-09-24

    You can retrieve this plugin from its official repo on Github : Leaflet.markercluster.

  • SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

    SaaSHub logo
  • machine-learning-articles

    πŸ§ πŸ’¬ Articles I wrote about machine learning, archived from MachineCurve.com.

  • awesome-single-cell

    Community-curated list of software packages and data resources for single-cell, including RNA-seq, ATAC-seq, etc.

  • hdbscan

    A high performance implementation of HDBSCAN clustering.

  • awesome-community-detection

    A curated list of community detection research papers with implementations.

  • supercluster

    A very fast geospatial point clustering library for browsers and Node.

  • RubixML

    A high-level machine learning and deep learning library for the PHP language.

  • Project mention: KNN with PHP ML & Rubix ML | dev.to | 2024-05-06

    This post is for anybody who has tried to migrate to Rubix ML from PHP ML and more specifically anybody who is experimenting with K-Nearest Neighbors.

  • libcluster

    Automatic cluster formation/healing for Elixir applications

  • Project mention: Elixir clustering using Postgres | dev.to | 2024-01-25

    libcluster is the go-to package for connecting multiple BEAM instances and setting up healing strategies. libcluster provides out-of-the-box strategies and it allows users to define their own strategies by implementing a simple behavior that defines cluster formation and healing according to the supporting service you want to use.

  • bottleneck

    Job scheduler and rate limiter, supports Clustering

  • Project mention: How can i improve my web scraper to be less abusive to the website. | /r/node | 2023-06-30
  • MLJ.jl

    A Julia machine learning framework

  • protoactor-dotnet

    Proto Actor - Ultra fast distributed actors for Go, C# and Java/Kotlin

  • usearch

    Fast Open-Source Search & Clustering engine Γ— for Vectors & πŸ”œ Strings Γ— in C++, C, Python, JavaScript, Rust, Java, Objective-C, Swift, C#, GoLang, and Wolfram πŸ”

  • Project mention: I'm writing a new vector search SQLite Extension | news.ycombinator.com | 2024-05-02

    Might have a look at this library:

    https://github.com/unum-cloud/usearch

    It does HNSW and there is a SQLite related project, though not quite the same thing.

  • MooseFS

    MooseFS – Open Source, Petabyte, Fault-Tolerant, Highly Performing, Scalable Network Distributed File System (Software-Defined Storage)

  • Project mention: Ask HN: What distributed file system would you use in 2024? | news.ycombinator.com | 2024-05-10
  • uis-rnn

    This is the library for the Unbounded Interleaved-State Recurrent Neural Network (UIS-RNN) algorithm, corresponding to the paper Fully Supervised Speaker Diarization.

  • mteb

    MTEB: Massive Text Embedding Benchmark

  • Project mention: AI for AWS Documentation | news.ycombinator.com | 2023-07-06

    RAG is very difficult to do right. I am experimenting with various RAG projects from [1]. The main problems are:

    - Chunking can interfer with context boundaries

    - Content vectors can differ vastly from question vectors, for this you have to use hypothetical embeddings (they generate artificial questions and store them)

    - Instead of saving just one embedding per text-chuck you should store various (text chunk, hypothetical embedding questions, meta data)

    - RAG will miserably fail with requests like "summarize the whole document"

    - to my knowledge, openAI embeddings aren't performing well, use a embedding that is optimized for question answering or information retrieval and supports multi language. Also look into instructor embeddings: https://github.com/embeddings-benchmark/mteb

    1 https://github.com/underlines/awesome-marketing-datascience/...

  • minisom

    :red_circle: MiniSom is a minimalistic implementation of the Self Organizing Maps

  • Unsupervised-Classification

    SCAN: Learning to Classify Images without Labels, incl. SimCLR. [ECCV 2020]

  • SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

    SaaSHub logo
NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020).

Clustering related posts

  • Elixir clustering using Postgres

    4 projects | dev.to | 25 Jan 2024
  • NATS by Example - Examples of how to use NATS and JetStream in various languages

    1 project | /r/NATS_io | 2 Nov 2023
  • Phoenix 1.7 for Elixir: Edit a Form in a Modal

    1 project | news.ycombinator.com | 12 Sep 2023
  • Elixir for Ruby developers: the three most important differences

    5 projects | news.ycombinator.com | 23 Jul 2023
  • Llama V2 is free to try on the AI Horde

    2 projects | news.ycombinator.com | 18 Jul 2023
  • Recommendations to get me back into anime?

    1 project | /r/anime | 4 Jul 2023
  • Want something better than k-means? Try BanditPAM (github.com/motiwari)

    1 project | /r/linux | 27 Jun 2023
  • A note from our sponsor - SaaSHub
    www.saashub.com | 10 May 2024
    SaaSHub helps you find the best software and product alternatives Learn more β†’

Index

What are some of the best open-source Clustering projects? This list will help you:

Project Stars
1 pycaret 8,450
2 Smile 5,930
3 postgresml 5,468
4 protoactor-go 4,883
5 orange 4,619
6 dedupe 3,984
7 Leaflet.markercluster 3,864
8 machine-learning-articles 3,108
9 awesome-single-cell 2,927
10 hdbscan 2,685
11 awesome-community-detection 2,270
12 supercluster 2,022
13 RubixML 1,977
14 libcluster 1,886
15 bottleneck 1,744
16 MLJ.jl 1,725
17 protoactor-dotnet 1,668
18 usearch 1,691
19 MooseFS 1,590
20 uis-rnn 1,533
21 mteb 1,421
22 minisom 1,388
23 Unsupervised-Classification 1,309

Sponsored
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com