databricks-nutter-repos-demo
dask-gateway
Our great sponsors
databricks-nutter-repos-demo | dask-gateway | |
---|---|---|
2 | 4 | |
144 | 127 | |
- | 0.8% | |
4.3 | 8.4 | |
3 months ago | 7 days ago | |
Python | Python | |
MIT License | BSD 3-clause "New" or "Revised" License |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
databricks-nutter-repos-demo
-
Ask HN: Tips for software engineering sanity with Databricks notebooks?
You can use Databricks Repos (https://docs.databricks.com/repos/index.html) specifically files in repos (https://docs.databricks.com/repos/work-with-notebooks-other-...) functionality that allows to use Python files (not notebooks!) as Python modules.
Another alternative is to split notebooks into “library notebooks” that just define transformations, and “orchestration notebooks” that use code library notebooks to execute a “business logic”.
In both approaches you can do code testing, etc.
P.S. I have a demo of both approaches here: https://github.com/alexott/databricks-nutter-repos-demo
-
Why Databricks Is Winning
I’m sorry for delay, will fix ASAP...
My point is that you can do that even without jars/wheels - you can do VC and tests of notebooks. For example, https://github.com/alexott/databricks-nutter-projects-demo
dask-gateway
- How to change the API version from v1alpha to v1 prior to upgrading the kubernetes cluster?
-
How can we change the API versions of kubernetes objects in GKE prior to cluster upgrade?
Those two resource types are using the traefik.containo.us/v1alpha1 API version, which itself is defined at https://github.com/dask/dask-gateway/blob/main/resources/helm/dask-gateway/crds/traefik.yaml, and doesn't use the deprecated CRD API.
-
Why Databricks Is Winning
I’ve had a lot of success with Dask lately. It’s comparable to spark in some ways [0]. Being written in python and built on top of pandas/numpy it allows much more flexibility. It also has great tools built on top of kubernetes making deployment quick and easy [1].
[0] https://docs.dask.org/en/latest/spark.html
[1] https://github.com/dask/dask-gateway
What are some alternatives?
flintrock - A command-line tool for launching Apache Spark clusters.
spark-snowflake - Snowflake Data Source for Apache Spark.
chispa - PySpark test helper methods with beautiful error messages
kube-no-trouble - Easily check your clusters for use of deprecated APIs
transformers - 🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.