dagster-sklearn vs dagster-example-pipeline

dagster-sklearn

dagster scikit-learn pipeline example. (by pybokeh)

Source Code

Suggest alternative

Edit details

dagster-example-pipeline

Template Dagster repo using poetry and a single Docker container; works well with CICD (by MileTwo)

dagster Poetry Python Data Science data-engineering

Source Code

Suggest alternative

Edit details

InfluxDB - Power Real-Time Data Analytics at Scale

Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

www.influxdata.com

featured

SaaSHub - Software Alternatives and Reviews

SaaSHub helps you find the best software and product alternatives

www.saashub.com

featured

dagster-sklearn		dagster-example-pipeline
	Project
3	Mentions	1
40	Stars	64
-	Growth	-
0.0	Activity	0.0
about 1 year ago	Latest Commit	about 2 years ago
Python	Language	Python
-	License	Apache License 2.0

The number of mentions indicates the total number of mentions that we've tracked plus the number of user suggested alternatives.
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.

dagster-sklearn

Posts with mentions or reviews of dagster-sklearn. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2021-05-07.

Scheduling tools for ETL and ML flow
3 projects | /r/dataengineering | 7 May 2021

I would give dagster a look. It has a built-in native scheduler and is cross-platform. It is general purpose, so your team can grow with it and tackle broader set of use cases if needed. If you struggle to get started after reading their docs/tutorials, you can take a look at my personal repo. Ive gotten a few feedback that my example has been very useful in getting started. I know they revamped their docs recently, but havent looked at their tutorial again or looked to see if they provided an intermediate level full example yet, so I need to get back in there to see.
Dagster Tutorials/Presentations
1 project | /r/dataengineering | 4 Apr 2021

Hey! I've recently started to use dagster and it's been great with its 0.11.x releases. I am still a newbie with it and maybe only use 20% of its features and abstractions. Here's my work-in-progress personal Github repo. Not sure if you'll learn much from it.
Is anyone trying to switch out of data science, and if so, what jobs are you applying for?
2 projects | /r/datascience | 4 Apr 2021

I have created a trivial, contrived scikit-learn example using dagster so that people have an idea of how it can be used.

dagster-example-pipeline

Posts with mentions or reviews of dagster-example-pipeline. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2022-03-25.

Developing in Dagster
2 projects | dev.to | 25 Mar 2022

The associated code repo can be found here

What are some alternatives?

When comparing dagster-sklearn and dagster-example-pipeline you can also consider the following projects:

Dask - Parallel computing with task scheduling

mlrun - MLRun is an open source MLOps platform for quickly building and managing continuous ML applications across their lifecycle. MLRun integrates into your development and CI/CD environment and automates the delivery of production data, ML pipelines, and online applications.

dagster - An orchestration platform for the development, production, and observation of data assets.

Apache Superset - Apache Superset is a Data Visualization and Data Exploration Platform [Moved to: https://github.com/apache/superset]

ploomber - The fastest ⚡️ way to build data pipelines. Develop iteratively, deploy anywhere. ☁️

AWS Data Wrangler - pandas on AWS - Easy integration with Athena, Glue, Redshift, Timestream, Neptune, OpenSearch, QuickSight, Chime, CloudWatchLogs, DynamoDB, EMR, SecretManager, PostgreSQL, MySQL, SQLServer and S3 (Parquet, CSV, JSON and EXCEL).

yellowbrick - Visual analysis and diagnostic tools to facilitate machine learning model selection.

Prefect - The easiest way to build, run, and monitor data pipelines at scale.

best-of-ml-python - 🏆 A ranked list of awesome machine learning Python libraries. Updated weekly.

canarypy - CanaryPy - A light and powerful canary release for Data Pipelines

portable-data-stack-dagster - A portable Datamart and Business Intelligence suite built with Docker, Dagster, dbt, DuckDB, PostgreSQL and Superset

aws-data-wrangler - pandas on AWS - Easy integration with Athena, Glue, Redshift, Timestream, Neptune, OpenSearch, QuickSight, Chime, CloudWatchLogs, DynamoDB, EMR, SecretManager, PostgreSQL, MySQL, SQLServer and S3 (Parquet, CSV, JSON and EXCEL). [Moved to: https://github.com/aws/aws-sdk-pandas]