dagster-sklearn
dagster-example-pipeline
dagster-sklearn | dagster-example-pipeline | |
---|---|---|
3 | 1 | |
40 | 64 | |
- | - | |
0.0 | 0.0 | |
about 1 year ago | about 2 years ago | |
Python | Python | |
- | Apache License 2.0 |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
dagster-sklearn
-
Scheduling tools for ETL and ML flow
I would give dagster a look. It has a built-in native scheduler and is cross-platform. It is general purpose, so your team can grow with it and tackle broader set of use cases if needed. If you struggle to get started after reading their docs/tutorials, you can take a look at my personal repo. Ive gotten a few feedback that my example has been very useful in getting started. I know they revamped their docs recently, but havent looked at their tutorial again or looked to see if they provided an intermediate level full example yet, so I need to get back in there to see.
-
Dagster Tutorials/Presentations
Hey! I've recently started to use dagster and it's been great with its 0.11.x releases. I am still a newbie with it and maybe only use 20% of its features and abstractions. Here's my work-in-progress personal Github repo. Not sure if you'll learn much from it.
-
Is anyone trying to switch out of data science, and if so, what jobs are you applying for?
I have created a trivial, contrived scikit-learn example using dagster so that people have an idea of how it can be used.
dagster-example-pipeline
-
Developing in Dagster
The associated code repo can be found here
What are some alternatives?
Dask - Parallel computing with task scheduling
mlrun - MLRun is an open source MLOps platform for quickly building and managing continuous ML applications across their lifecycle. MLRun integrates into your development and CI/CD environment and automates the delivery of production data, ML pipelines, and online applications.
dagster - An orchestration platform for the development, production, and observation of data assets.
Apache Superset - Apache Superset is a Data Visualization and Data Exploration Platform [Moved to: https://github.com/apache/superset]
ploomber - The fastest ⚡️ way to build data pipelines. Develop iteratively, deploy anywhere. ☁️
AWS Data Wrangler - pandas on AWS - Easy integration with Athena, Glue, Redshift, Timestream, Neptune, OpenSearch, QuickSight, Chime, CloudWatchLogs, DynamoDB, EMR, SecretManager, PostgreSQL, MySQL, SQLServer and S3 (Parquet, CSV, JSON and EXCEL).
yellowbrick - Visual analysis and diagnostic tools to facilitate machine learning model selection.
Prefect - The easiest way to build, run, and monitor data pipelines at scale.
best-of-ml-python - 🏆 A ranked list of awesome machine learning Python libraries. Updated weekly.
canarypy - CanaryPy - A light and powerful canary release for Data Pipelines
portable-data-stack-dagster - A portable Datamart and Business Intelligence suite built with Docker, Dagster, dbt, DuckDB, PostgreSQL and Superset
aws-data-wrangler - pandas on AWS - Easy integration with Athena, Glue, Redshift, Timestream, Neptune, OpenSearch, QuickSight, Chime, CloudWatchLogs, DynamoDB, EMR, SecretManager, PostgreSQL, MySQL, SQLServer and S3 (Parquet, CSV, JSON and EXCEL). [Moved to: https://github.com/aws/aws-sdk-pandas]