dagster

An orchestration platform for the development, production, and observation of data assets. (by dagster-io)

Dagster Alternatives

Similar projects and alternatives to dagster

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a better dagster alternative or higher similarity.

Suggest an alternative to dagster

Reviews and mentions

Posts with mentions or reviews of dagster. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2021-10-11.
  • Airflow 2.0 vs Prefect
    1 project | reddit.com/r/dataengineering | 20 Oct 2021
    It has been such a pleasure to use dagster. The testability is nice. It was designed to be type aware, so you can leverage type checks and it is also designed to be data aware when it comes to passing data between tasks. One negative I dont like is its handling of instances where a task does not produce output, but need to still indicate dependency of another task, so you utilize its Nothing abstraction. The syntax for this situation is awkward IMO and they've recognized that. Its UI called dagit is hands down, the best as it provides rich information on each task in your DAG. The developer experience is definitely better with dagster compared to Airflow. I briefly looked at Airflow 2.0 examples, and I still think dagster's API is better ( with version 0.13.x ). However, on the managed environment side, there is no 3rd party managed dagster provider other than the creator of dagster called Elementl has their cloud offering which is currently in beta. So there is no mature managed services for dagster yet. Again, this is due to dagster being a relatively new library - less than 3 years old.
  • MLOps project based template
    4 projects | reddit.com/r/mlops | 11 Oct 2021
    Data Pipeline - Dagster
  • Runflow - define and run workflows using HCL2
    1 project | reddit.com/r/datascience | 24 Jul 2021
    I feel like dagster is a hidden gem that is useful for a more broader user base of Python data personas as it is cross platform and so all of its key features (scheduler and web UI) work on Windows OS, unlike other major workflow orchestration frameworks. Airflow? Nope. Prefect? Nope. They effectively ignored all the small fry data folks in the corporate Windows world who are still critical to their respective organization. So even a data analyst with Python coding experience can become immediately productive using dagster. Its web UI is optional and has APIs to execute with just Python API, CLI API, or web UI. So you are not forced to execute in one way. It is designed to be very general purpose and not domain specific which I think is a good thing as it means it can be used in a variety of use cases.
  • New to data orchestration? Start here.
    2 projects | dev.to | 2 Jun 2021
    Second-generation data orchestration tools like Dagster and Prefect are more focused on being data-driven. They’re able to detect the kinds of data within DAGs and improve data awareness by anticipating the actions triggered by each data type.
  • Is Airflow a passé? What replaces it?
    2 projects | reddit.com/r/dataengineering | 10 May 2021
    There's Prefect and Dagster as up and comers in the space.
  • Scheduling tools for ETL and ML flow
    3 projects | reddit.com/r/dataengineering | 7 May 2021
    I would give dagster a look. It has a built-in native scheduler and is cross-platform. It is general purpose, so your team can grow with it and tackle broader set of use cases if needed. If you struggle to get started after reading their docs/tutorials, you can take a look at my personal repo. Ive gotten a few feedback that my example has been very useful in getting started. I know they revamped their docs recently, but havent looked at their tutorial again or looked to see if they provided an intermediate level full example yet, so I need to get back in there to see.
  • API versioning has no “right way” (2017)
    2 projects | news.ycombinator.com | 26 Apr 2021
    Versioning is indeed a hard topic, especially for data science/engineering projects in production.

    When you have a pipleine defined as complex DAG of operations, you can't just version the entire thing, unless you have enough resources to re-compute from scratch with every change, which is wasteful. So then, you have to keep track of data dependencies and their versions if you would like to ensure reproducibility.

    Versioning code isn't enough when you have runtime parameters that affect output data, and you want to stay flexible by allowing experimenting and re-running computations with different parameters, to be able to iterate quickly. Which poses a lot of challenges.

    And there doesn't seem to be a framework that solves those issues out of the box. I'm closely watching closely Dagster (https://dagster.io), as they seem to be aware of those challenges (for example for versioning: https://docs.dagster.io/guides/dagster/memoization), but I didn't try it yet; it introduces a lot of concepts and has a steep learning curve.

  • Best technologies for a beginner DE.
    1 project | reddit.com/r/dataengineering | 21 Apr 2021
    dagster!
  • Hi, how can I do pipeline automation?
    2 projects | reddit.com/r/learnpython | 18 Apr 2021
    If you are just starting out or new to doing automation, I would look at just python scripts executed with CRON if on Linux/Mac or Windows Task Scheduler if on Windows. But you'll need bash (Linux/Mac) knowledge or DOS/batch knowledge (Windows). Then graduate to using frameworks. Since you didnt specify what types of jobs you want to automate, for general purpose needs, I would look at a class of frameworks called task orchestration frameworks or workflow management libraries. I would highly recommend dagster as it comes with a native scheduler so you would be free from having to use CRON or Windows Task Scheduler. Other options include prefect, but if you want its other features like its scheduler and web GUI, you'll have to mess with docker. That's what's nice about dagster, it all works out of the box without need for non-Python dependencies.
  • Open source contributions for a Data Engineer?
    17 projects | reddit.com/r/dataengineering | 16 Apr 2021
    It's a near crime that Dagster hasn't been mentioned already.
  • Is anyone trying to switch out of data science, and if so, what jobs are you applying for?
    2 projects | reddit.com/r/datascience | 4 Apr 2021
    I am currently using dagster as a workflow orchestration/ETL tool and it is such a fantastic framework. It allows DAs or DSs to be more full-stack. I think using something like dagster is the future. It is cross-platform and acutally works well on Windows, so even a data analyst or data scientist in a small fry corporate Windows environment can become more full-stack with dagster. So I can see it having more widespread use by more people in various industries, not just tech-focused industries where they are using exclusive non-Windows stack. When they are teamed up with experienced software engineers, their projects can scale if needed.
  • How to schedule a program to run every day at a certain time?
    2 projects | reddit.com/r/pythontips | 15 Mar 2021
    The uncool, unsexy way is using Windows Task Scheduler as someone else has mentioned. The cool, over-engineered way is using a task orchestration library like dagster.
  • Looking for non-dev friendly batch job operation service
    2 projects | reddit.com/r/devops | 6 Mar 2021
    You can also try dagster: https://dagster.io/. We use it for ML, but it’s quite flexible.
  • Fastest way to open semi-large files and merge
    2 projects | reddit.com/r/datascience | 24 Feb 2021
    This can be done as a batch job. 350MB is not really that big and may even be smaller if you just need a subset of the columns. You would basically loop through and process each file individually and append if I understood you correctly. My initial implementation would be to use a combination of zip, StringIO, and csv modules to process the zip file in-memory since it should fit comfortably in RAM. The issue would be is having a fault tolerant process to do this continuously and reliably. So for that I would use a general purpose scheduler. If you're stuck on Windows, I highly recommend dagster as it now comes with an awesome general purpose scheduler that works in Windows. Otherwise, I would look into Airflow or Prefect with Prefect easier to use than Airflow. Ideally, you would use cloud resources, but can be done locally with a VM. But more importantly, where do you intend the final resting place to be? I would recommend a database.
  • For those using Airflow for your ELT/Orchestration, How are you perfroming your EL?
    2 projects | reddit.com/r/dataengineering | 30 Jan 2021
    (T) : https://github.com/fishtown-analytics/dbt + https://github.com/great-expectations/great_expectations + https://github.com/dagster-io/dagster

Stats

Basic dagster repo stats
16
4,047
9.9
2 days ago

dagster-io/dagster is an open source project licensed under Apache License 2.0 which is an OSI approved license.

SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com
Find remote Python jobs at our new job board 99remotejobs.com. There are 9 new remote jobs listed recently.
Are you hiring? Post a new remote job listing for free.