|30 days ago||1 day ago|
|Apache License 2.0||Apache License 2.0|
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
How to Run Spark SQL on Encrypted Data
dev.to | 2021-08-10
Introducing Opaque SQL, an open-source platform for securely running Spark SQL queries on encrypted data. Built by top systems and security researchers at UC Berkeley, the platform uses hardware enclaves to securely execute queries on private data in an untrusted environment.
Announcing MC²: Securely perform analytics and machine learning on confidential data
dev.to | 2021-06-17
The MC2 Compute Services: MC2 offers several compute services: these include Spark SQL, distributed XGBoost, and secure aggregation for federated learning. All are intended to run in a primarily untrusted environment, such as a cluster of machines hosted on a public cloud, that has support for trusted execution environments (hardware enclaves). Data is encrypted in transit using a client key and only ever decrypted inside hardware enclaves, providing the previously mentioned security guarantees for data-in-use. For all compute services, MC2 leverages the Open Enclave SDK, a project intended to provide a consistent API for a variety of different enclave architectures.
reddit.com/r/apachespark | 2021-03-12
What are some alternatives?
Trino - Official repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)
Apache Spark - Apache Spark - A unified analytics engine for large-scale data processing
cuelake - Use SQL to build ELT pipelines on a data lakehouse.
mc2 - A Platform for Secure Analytics and Machine Learning