Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality. Learn more β
Top 23 Clustering Open-Source Projects
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
-
dedupe
:id: A python library for accurate and scalable fuzzy matching, record deduplication and entity-resolution.
-
machine-learning-articles
π§ π¬ Articles I wrote about machine learning, archived from MachineCurve.com.
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
-
awesome-single-cell
Community-curated list of software packages and data resources for single-cell, including RNA-seq, ATAC-seq, etc.
-
awesome-community-detection
A curated list of community detection research papers with implementations.
-
usearch
Fast Open-Source Search & Clustering engine Γ for Vectors & π Strings Γ in C++, C, Python, JavaScript, Rust, Java, Objective-C, Swift, C#, GoLang, and Wolfram π
-
MooseFS
MooseFS β Open Source, Petabyte, Fault-Tolerant, Highly Performing, Scalable Network Distributed File System (Software-Defined Storage)
-
uis-rnn
This is the library for the Unbounded Interleaved-State Recurrent Neural Network (UIS-RNN) algorithm, corresponding to the paper Fully Supervised Speaker Diarization.
-
Unsupervised-Classification
SCAN: Learning to Classify Images without Labels, incl. SimCLR. [ECCV 2020]
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
Project mention: The Current State of Clojure's Machine Learning Ecosystem | news.ycombinator.com | 2024-04-07> I don't think it's right to recommend that new users move away from the package because of licensing issues
I was going to chime in to agree but then I saw how this was done - a completely innocuous looking commit:
https://github.com/haifengl/smile/commit/6f22097b233a3436519...
And literally no mention in the release notes:
https://github.com/haifengl/smile/releases/tag/v3.0.0
I think if you are going to change license especially in a way that makes it less permissive you need to be super open and clear about both the fact you are doing it and your reasons for that. This is done so silently as to look like it is intentionally trying to mislead and trick people.
So maybe I wouldn't say to move away because of the specific license, but it's legitimate to avoid something when it's so clearly driven by a single entity and that entity acts in a way that isn't trustworthy.
Project mention: Is there a programming language that will blow my mind? | /r/ProgrammingLanguages | 2023-06-01https://github.com/asynkron/protoactor-go & this is a great lib, that implements a Erlang/Akka-like the Actor Model in Go.
I know I've tooted its horn before, but Orange3 is a pretty neat Python-based GUI platform that makes this and a metric buttload of other statistical/ML techniques available to non-programmer types.
Just watch out for null character `x00` in the corpus. That always seems to kill it stone dead.
https://orangedatamining.com/
https://orange3.readthedocs.io/projects/orange-visual-progra...
You can retrieve this plugin from its official repo on Github : Leaflet.markercluster.
Project mention: Machine learning and deep learning library for the PHP language | news.ycombinator.com | 2023-11-04
libcluster is the go-to package for connecting multiple BEAM instances and setting up healing strategies. libcluster provides out-of-the-box strategies and it allows users to define their own strategies by implementing a simple behavior that defines cluster formation and healing according to the supporting service you want to use.
Project mention: How can i improve my web scraper to be less abusive to the website. | /r/node | 2023-06-30
Project mention: USearch SQLite Extensions for Vector and Text Search | news.ycombinator.com | 2024-02-22
RAG is very difficult to do right. I am experimenting with various RAG projects from [1]. The main problems are:
- Chunking can interfer with context boundaries
- Content vectors can differ vastly from question vectors, for this you have to use hypothetical embeddings (they generate artificial questions and store them)
- Instead of saving just one embedding per text-chuck you should store various (text chunk, hypothetical embedding questions, meta data)
- RAG will miserably fail with requests like "summarize the whole document"
- to my knowledge, openAI embeddings aren't performing well, use a embedding that is optimized for question answering or information retrieval and supports multi language. Also look into instructor embeddings: https://github.com/embeddings-benchmark/mteb
1 https://github.com/underlines/awesome-marketing-datascience/...
Clustering related posts
-
Elixir clustering using Postgres
-
NATS by Example - Examples of how to use NATS and JetStream in various languages
-
Phoenix 1.7 for Elixir: Edit a Form in a Modal
-
Elixir for Ruby developers: the three most important differences
-
Llama V2 is free to try on the AI Horde
-
Recommendations to get me back into anime?
-
Want something better than k-means? Try BanditPAM (github.com/motiwari)
-
A note from our sponsor - InfluxDB
www.influxdata.com | 1 May 2024
Index
What are some of the best open-source Clustering projects? This list will help you:
Project | Stars | |
---|---|---|
1 | pycaret | 8,406 |
2 | Smile | 5,924 |
3 | protoactor-go | 4,877 |
4 | orange | 4,611 |
5 | dedupe | 3,979 |
6 | Leaflet.markercluster | 3,854 |
7 | machine-learning-articles | 3,093 |
8 | awesome-single-cell | 2,907 |
9 | hdbscan | 2,672 |
10 | awesome-community-detection | 2,266 |
11 | supercluster | 2,018 |
12 | RubixML | 1,975 |
13 | libcluster | 1,886 |
14 | bottleneck | 1,739 |
15 | MLJ.jl | 1,722 |
16 | protoactor-dotnet | 1,661 |
17 | usearch | 1,647 |
18 | MooseFS | 1,584 |
19 | uis-rnn | 1,530 |
20 | minisom | 1,387 |
21 | mteb | 1,395 |
22 | Unsupervised-Classification | 1,306 |
23 | Cluster | 1,262 |
Sponsored