generalized-kmeans-clustering
LearningSparkV2
generalized-kmeans-clustering | LearningSparkV2 | |
---|---|---|
1 | 1 | |
297 | 1,095 | |
- | 3.3% | |
8.2 | 0.0 | |
4 months ago | over 1 year ago | |
HTML | Scala | |
Apache License 2.0 | Apache License 2.0 |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
generalized-kmeans-clustering
LearningSparkV2
-
datadelivery: Providing public datasets to explore in AWS
Learning Spark
What are some alternatives?
hammock-public - Visualize text embeddings
incubator-gluten - Gluten is a middle layer responsible for offloading JVM-based SQL engines' execution to native engines.
coreset - Implementation of lightweight coresets for data summarization
kyuubi - Apache Kyuubi is a distributed and multi-tenant gateway to provide serverless SQL on data warehouses and lakehouses.
Spark-The-Definitive-Guide - Spark: The Definitive Guide's Code Repository
delta-sharing - An open protocol for secure data sharing
datadelivery - A Terraform module that provides an efficient way to activate pieces and services in an AWS account in order to enable users to explore preselected public datasets.
s3-sqs-connector - A library for reading data from Amzon S3 with optimised listing using Amazon SQS using Spark SQL Streaming ( or Structured streaming).
Apache-Hive-Essentials-Second-Edition - Apache Hive Essentials, Second Edition published by Packt
delta - An open-source storage framework that enables building a Lakehouse architecture with compute engines including Spark, PrestoDB, Flink, Trino, and Hive and APIs
Read the Docs - The source code that powers readthedocs.org