spark-snowflake
dask-gateway
Our great sponsors
spark-snowflake | dask-gateway | |
---|---|---|
1 | 4 | |
196 | 127 | |
-0.5% | 0.8% | |
5.6 | 8.4 | |
2 months ago | 6 days ago | |
Scala | Python | |
Apache License 2.0 | BSD 3-clause "New" or "Revised" License |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
spark-snowflake
-
Why Databricks Is Winning
Snowflake and Databricks are different, sometimes complementary technologies. You can store data in Snowflake & query it with Databricks for example: https://github.com/snowflakedb/spark-snowflake
Snowflake predicate pushdown filtering seems quite promising: https://www.snowflake.com/blog/snowflake-spark-part-2-pushin...
Think both these companies can win.
dask-gateway
- How to change the API version from v1alpha to v1 prior to upgrading the kubernetes cluster?
-
How can we change the API versions of kubernetes objects in GKE prior to cluster upgrade?
Those two resource types are using the traefik.containo.us/v1alpha1 API version, which itself is defined at https://github.com/dask/dask-gateway/blob/main/resources/helm/dask-gateway/crds/traefik.yaml, and doesn't use the deprecated CRD API.
-
Why Databricks Is Winning
I’ve had a lot of success with Dask lately. It’s comparable to spark in some ways [0]. Being written in python and built on top of pandas/numpy it allows much more flexibility. It also has great tools built on top of kubernetes making deployment quick and easy [1].
[0] https://docs.dask.org/en/latest/spark.html
[1] https://github.com/dask/dask-gateway
What are some alternatives?
databricks-nutter-repos-demo - Demo of using the Nutter for testing of Databricks notebooks in the CI/CD pipeline
flintrock - A command-line tool for launching Apache Spark clusters.
kube-no-trouble - Easily check your clusters for use of deprecated APIs
Apache Spark - Apache Spark - A unified analytics engine for large-scale data processing
transformers - 🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
delta - An open-source storage framework that enables building a Lakehouse architecture with compute engines including Spark, PrestoDB, Flink, Trino, and Hive and APIs
chispa - PySpark test helper methods with beautiful error messages