SaaSHub helps you find the best software and product alternatives Learn more →
Top 23 apache-spark Open-Source Projects
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
-
spark-operator
Kubernetes operator for managing the lifecycle of Apache Spark applications on Kubernetes.
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
-
LearningSparkV2
This is the github repo for Learning Spark: Lightning-Fast Data Analytics [2nd Edition]
-
sparkMeasure
This is the development repository for sparkMeasure, a tool and library designed for efficient analysis and troubleshooting of Apache Spark jobs. It focuses on easing the collection and examination of Spark metrics, making it a practical choice for both developers and data engineers.
-
delight
A Spark UI and Spark History Server alternative with CPU and Memory metrics! Delight is free, cross-platform, and open-source.
-
scalable-data-science
Scalable Data Science, course sets in big data Using Apache Spark over databricks and their mathematical, statistical and computational foundations using SageMath.
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
Project mention: Observations on MLOps–A Fragmented Mosaic of Mismatched Expectations | dev.to | 2024-04-26How can this be? The current state of practice in AI/ML work requires adaptivity, which is uncommon in classical computational fields. There are myriad tools that capture the work across the many instances of the AI/ML lifecycle. The idea that any one tool could sufficiently capture the dynamic work is unrealistic. Take, for example, an experiment tracking tool like W&B or MLFlow; some form of experiment tracking is necessary in typical model training lifecycles. Such a tool requires some notion of a dataset. However, a tool focusing on experiment tracking is orthogonal to the needs of analyzing model performance at the data sample level, which is critical to understanding the failure modes of models. The way one does this depends on the type of data and the AI/ML task at hand. In other words, MLOps is inherently an intricate mosaic, as the capabilities and best practices of AI/ML work evolve.
# Download the LakeFS binary wget https://github.com/treeverse/lakeFS/releases/latest/download/lakefs # Make the binary executable chmod +x lakefs # Initialize LakeFS with S3 as the storage backend ./lakefs init --backend s3 --s3-gateway-endpoint --s3-region --s3-force-path-style --s3-access-key --s3-secret-key
Project mention: Dependency issue with Pyspark running on Kubernetes using spark-on-k8s-operator | /r/codehunter | 2023-05-31I have spent days now trying to figure out a dependency issue I'm experiencing with (Py)Spark running on Kubernetes. I'm using the spark-on-k8s-operator and Spark's Google Cloud connector.
Project mention: Show HN: DataFlint, performance monitoring for Apache Spark | news.ycombinator.com | 2023-12-28
apache-spark related posts
-
Observations on MLOps–A Fragmented Mosaic of Mismatched Expectations
-
Explain me how websites like Dall-E, chatgpt, thispersondoesntexit process the user data so quickly
-
[D] What licensed software do you use for machine learning experimentation tracking?
-
Dependency issue with Pyspark running on Kubernetes using spark-on-k8s-operator
-
[Q] Is there a tool to keep track of my ML experiments?
-
Experience setting up Spark and Hudi on Kubernetes
-
Remote file access vulnerability in `mlflow server` and `mlflow ui` CLIs
-
A note from our sponsor - SaaSHub
www.saashub.com | 5 May 2024
Index
What are some of the best open-source apache-spark projects? This list will help you:
Project | Stars | |
---|---|---|
1 | MLflow | 17,335 |
2 | SynapseML | 4,970 |
3 | lakeFS | 4,081 |
4 | Spark Notebook | 3,147 |
5 | spark-operator | 2,613 |
6 | docker-spark | 2,011 |
7 | spark | 1,999 |
8 | feathr | 1,931 |
9 | awesome-spark | 1,617 |
10 | LearningSparkV2 | 1,095 |
11 | Mobius: C# API for Spark | 937 |
12 | sparkMeasure | 642 |
13 | flintrock | 630 |
14 | quinn | 580 |
15 | awesome-kafka | 565 |
16 | sparkle | 444 |
17 | PySpark-Boilerplate | 391 |
18 | sparktorch | 335 |
19 | delight | 332 |
20 | cuelake | 284 |
21 | scalable-data-science | 165 |
22 | spark | 126 |
23 | dataproc-templates | 111 |
Sponsored