sparkMeasure
delight
sparkMeasure | delight | |
---|---|---|
1 | 2 | |
642 | 332 | |
- | 0.0% | |
7.5 | 1.2 | |
11 days ago | about 1 year ago | |
Scala | Scala | |
Apache License 2.0 | GNU General Public License v3.0 or later |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
sparkMeasure
-
Spark Write Metrics
As an alternative to other proposed solutions, you could try and leverage the Spark metrics system to extract this information from accumulators. Metrics include total records and bytes written at each stage, among others. Take a look at SparkMeasure as well as an implementation example if you need to roll your own.
delight
-
The New & Improved Spark UI & Spark History Server is now Generally Available
We encourage you to try it out! Sign up, follow the installation instructions on our github page, and let us know your feedback over email (by replying to the welcome email) or using the live chat window in the product.
-
Public Release of Delight - A Spark UI complement with CPU & Memory metrics that will Delight you! Works for free on top of ANY Spark platform. Install our open-source agent and try it out!
Follow our Github https://github.com/datamechanics/delight or my LinkedIn https://www.linkedin.com/in/jystephan/ for updates.
What are some alternatives?
dblink - Distributed Bayesian Entity Resolution in Apache Spark
Spark Utils - Basic framework utilities to quickly start writing production ready Apache Spark applications
Apache Spark - Apache Spark - A unified analytics engine for large-scale data processing
Spark Tools - Executable Apache Spark Tools: Format Converter & SQL Processor
spark-operator - Kubernetes operator for managing the lifecycle of Apache Spark applications on Kubernetes.
SynapseML - Simple and Distributed Machine Learning
rfc8312bis - Revision of RFC8312 "CUBIC for Fast Long-Distance Networks"
mmlspark - Simple and Distributed Machine Learning [Moved to: https://github.com/microsoft/SynapseML]
netapp-dataops-toolkit - The NetApp DataOps Toolkit is a Python library that makes it simple for developers, data scientists, DevOps engineers, and data engineers to perform various data management tasks, such as near-instantaneously provisioning, cloning, or snapshotting a data volume or JupyterLab workspace.