dagster
papermill
Our great sponsors
dagster | papermill | |
---|---|---|
46 | 26 | |
10,215 | 5,630 | |
5.2% | 1.4% | |
10.0 | 8.0 | |
3 days ago | 3 days ago | |
Python | Python | |
Apache License 2.0 | BSD 3-clause "New" or "Revised" License |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
dagster
- Experience with Dagster.io?
-
Dagster tutorials
My recommendation is to continue on with the tutorial, then look at one of the larger example projects especially the ones named “project_”, and you should understand most of it. Of what you don't understand and you're curious about, look into the relevant concept page for the functions in the docs.
-
The Dagster Master Plan
I found this example that helped me - https://github.com/dagster-io/dagster/tree/master/examples/project_fully_featured/project_fully_featured
-
What are some open-source ML pipeline managers that are easy to use?
I would recommend the following: - https://www.mage.ai/ - https://dagster.io/ - https://www.prefect.io/ - https://metaflow.org/ - https://zenml.io/home
-
The Why and How of Dagster User Code Deployment Automation
In Helm terms: there are 2 charts, namely the system: dagster/dagster (values.yaml), and the user code: dagster/dagster-user-deployments (values.yaml). Note that you have to set dagster-user-deployments.enabled: true in the dagster/dagster values-yaml to enable this.
-
Best Orchestration Tool to run dbt projects?
Dagster seemed really cool when I looked into it as an alternative to airflow. I especially like the software defined assets and built-in lineage which I haven't seen in any other tool. However it seems it does not support RBAC which is a pretty big issue if you want a self-service type of architecture, see https://github.com/dagster-io/dagster/issues/2219. It does seem like it's available in their hosted version, but I wanted to run it myself on k8s.
-
dbt Cloud Alternatives?
Dagster? https://dagster.io
-
What's the best thing/library you learned this year ?
One that I haven't seen on here yet: dagster
- Anyone have an example of a project where a handful of the more popular Python tools are used? (E.g. airbyte, airflow, dbt, and pandas)
- Can we take a moment to appreciate how much of dataengineering is open source?
papermill
-
Spreadsheet errors can have disastrous consequences – yet we keep making them
Pandas docs > Comparison with spreadsheets: https://pandas.pydata.org/docs/getting_started/comparison/co...
Pandas docs > I/O > Excel files: https://pandas.pydata.org/docs/user_guide/io.html#excel-file...
nteract/papermill: https://github.com/nteract/papermill :
> papermill is a tool for parameterizing, executing, and analyzing Jupyter Notebooks. [...]
> This opens up new opportunities for how notebooks can be used. For example:
> - Perhaps you have a financial report that you wish to run with different values on the first or last day of a month or at the beginning or end of the year, using parameters makes this task easier.
"The World Excel Championship is being broadcast on ESPN" (2022) https://news.ycombinator.com/item?id=32420925 :
> Computational notebook speedrun ideas:
-
Jupyter Kernel Architecture
There is Papermill ... https://github.com/nteract/papermill
-
Git and Jupyter Notebooks Guide
https://github.com/jupyter/enhancement-proposals/pull/103#is...
Papermill is one tool for running Jupyter notebooks as reports; with the date in the filename. https://papermill.readthedocs.io/en/latest/
-
JupyterLab 4.0
You may be interested in papermill to address the parametrized analysis problem [1]. I think (but I'm not positive) this is what the data team at a previous job used to automate running notebooks for all sorts nightly reports.
[1] https://papermill.readthedocs.io/en/latest/#
-
Show HN: Mercury – convert Jupyter Notebooks to Web Apps without code rewriting
I'm using Papermill to operationalize Notebooks (https://github.com/nteract/papermill), it e.g. also has airflow support. I'm really happy with papermill for automatic notebook execution, in my field it's nice that we can go very quickly from analysis to operations -- while having super transparent "logging" in the executed notebooks.
-
What's the best thing/library you learned this year ?
papermill bcpandas fastapi
-
Does the Jupyter API allow using Jupyter from the CL?
But you can execute your notebook using Jupyter-run or papermill.
-
Running Jupyter notebooks in parallel
As a first option, we will use Papermill, which has a Python API that allows us to run different notebooks using some functions:
-
Tips for using Jupyter Notebooks with GitHub
Papermill can also target cloud storage outputs for hosting rendered notebooks, execute notebooks from custom Python code, and even be used within distributed data pipelines like Dagster (see Dagstermill). For more information, see the papermill documentation.
-
Three Tools for Executing Jupyter Notebooks
Papermill Source Code
What are some alternatives?
Prefect - The easiest way to build, run, and monitor data pipelines at scale.
nbconvert - Jupyter Notebook Conversion
Airflow - Apache Airflow - A platform to programmatically author, schedule, and monitor workflows
ploomber - The fastest ⚡️ way to build data pipelines. Develop iteratively, deploy anywhere. ☁️
Mage - 🧙 The modern replacement for Airflow. Mage is an open-source data pipeline tool for transforming and integrating data. https://github.com/mage-ai/mage-ai
airflow-notebook - This repository is no longer maintained.
airbyte - The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.
nbdev - Create delightful software with Jupyter Notebooks
MLflow - Open source platform for the machine learning lifecycle
voila - Voilà turns Jupyter notebooks into standalone web applications
meltano
jupytext - Jupyter Notebooks as Markdown Documents, Julia, Python or R scripts