hamilton
dagster
hamilton | dagster | |
---|---|---|
24 | 51 | |
2,007 | 12,418 | |
3.1% | 2.1% | |
9.7 | 10.0 | |
4 days ago | 7 days ago | |
Jupyter Notebook | Python | |
BSD 3-clause Clear License | Apache License 2.0 |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
hamilton
-
Show HN: I built an open-source data pipeline tool in Go
I always thought Hamilton [1] does a good job of giving enough visual hooks that draw you in.
I also noticed this pattern where library authors sometimes do a bit extra in terms of discussing and even promoting their competitors, and it makes me trust them more. A “heres why ours is better and everyone else sucks …” section always comes across as the infomercial character who is having quite a hard time peeling an apple to the point you wonder if this the first time they’ve used hands.
One thing wish for is a tool that’s essentially just Celery that doesn’t require a message broker (and can just use a database), and which is supported on Windows. There’s always a handful of edge cases where we’re pulling data from an old 32-bit system on Windows. And basically every system has some not-quite-ergonomic workaround that’s as much work as if you’d just built it yourself.
It seems like it’s just sending a JSON message over a queue or HTTP API and the worker receives it and runs the task. Maybe it’s way harder than I’m envisioning (but I don’t think so because I’ve already written most of it).
I guess that’s one thing I’m not clear on with Bruin, can I run workers if different physical locations and have them carry out the tasks in the right order? Or is this more of a centralized thing (meaning even if its K8s or Dask or Ray, those are all run in a cluster which happens to be distributed, but they’re all machines sitting in the same subnet, which isn’t the definition of a “distributed task” I’m going for.
[1] https://github.com/DAGWorks-Inc/hamilton
-
Greppability is an underrated code metric
Yep. When I was designing https://github.com/dagworks-inc/hamilton part of the idea was to make it easy to understand what and where. That is, enable one to grep for function definitions and their downstream use easily, and where people can't screw this up. You'd be surprised how easy it is to make a code base where grep doesn't help you all that much (at least in the python data transform world) ...
-
Ask HN: What are you working on (August 2024)?
Graph-based libraries for building ML/AI systems:
- Burr -- build AI applications/agents as state machines https://github.com/dagworks-inc/burr
- Hamilton -- build dataflows as DAGs: https://github.com/dagworks-inc/hamilton
Looking for feedback -- we had some good initial traction on HN, and are looking for OS users/contributors/people who are building complimentary tooling!
- Show HN: Hamilton's UI – observability, lineage, and catalog for data pipelines
-
Building an Email Assistant Application with Burr
Note that this uses simple OpenAI calls — you can replace this with Langchain, LlamaIndex, Hamilton (or something else) if you prefer more abstraction, and delegate to whatever LLM you like to use. And, you should probably use something a little more concrete (E.G. instructor) to guarantee output shape.
-
Using IPython Jupyter Magic commands to improve the notebook experience
In this post, we’ll show how your team can turn any utility function(s) into reusable IPython Jupyter magics for a better notebook experience. As an example, we’ll use Hamilton, my open source library, to motivate the creation of a magic that facilitates better development ergonomics for using it. You needn’t know what Hamilton is to understand this post.
-
FastUI: Build Better UIs Faster
We built an app with it -- https://blog.dagworks.io/p/building-a-lightweight-experiment. You can see the code here https://github.com/DAGWorks-Inc/hamilton/blob/main/hamilton/....
Usually we've been prototyping with streamlit, but found that at times to be clunky. FastUI still has rough edges, but we made it work for our lightweight app.
- Show HN: On Garbage Collection and Memory Optimization in Hamilton
-
Facebook Prophet: library for generating forecasts from any time series data
This library is old news? Is there anything new that they've added that's noteworthy to take it for another spin?
[disclaimer I'm a maintainer of Hamilton] Otherwise FYI Prophet gels well with https://github.com/DAGWorks-Inc/hamilton for setting up your features and dataset for fitting & prediction[/disclaimer].
- Show HN: Declarative Spark Transformations with Hamilton
dagster
-
Data Orchestration Tool Analysis: Airflow, Dagster, Flyte
Data orchestration tools are key for managing data pipelines in modern workflows. When it comes to tools, Apache Airflow, Dagster, and Flyte are popular tools serving this need, but they serve different purposes and follow different philosophies. Choosing the right tool for your requirements is essential for scalability and efficiency. In this blog, I will compare Apache Airflow, Dagster, and Flyte, exploring their evolution, features, and unique strengths, while sharing insights from my hands-on experience with these tools in a weather data pipeline project.
-
Data Engineering with DLT and REST
This article demonstrates how to work with near real-time and historical data using the dlt package. Whether you need to scale data access across the enterprise or provide historical data for post-event analysis, you can use the same framework to provide customer data. In a future article, I'll demonstrate how to use dlt with a workflow orchestrator such as Apache Airflow or Dagster.``
-
Top 10 MLOps Tools for 2025
4. Dagster
-
How I've implemented the Medallion architecture using Apache Spark and Apache Hdoop
Instead of the custom orchestrator I used, a proper orchestration tool should replace it like Apache Airflow, Dagster, ..., etc.
-
AI Strategy Guide: How to Scale AI Across Your Business
Level 1 of MLOps is when you've put each lifecycle stage and their intefaces in an automated pipeline. The pipeline could be a python or bash script, or it could be a directed acyclic graph run by some orchestration framework like Airflow, dagster or one of the cloud-provider offerings. AI- or data-specific platforms like MLflow, ClearML and dvc also feature pipeline capabilities.
- Experience with Dagster.io?
-
Dagster tutorials
My recommendation is to continue on with the tutorial, then look at one of the larger example projects especially the ones named “project_”, and you should understand most of it. Of what you don't understand and you're curious about, look into the relevant concept page for the functions in the docs.
-
The Dagster Master Plan
I found this example that helped me - https://github.com/dagster-io/dagster/tree/master/examples/project_fully_featured/project_fully_featured
-
What are some open-source ML pipeline managers that are easy to use?
I would recommend the following: - https://www.mage.ai/ - https://dagster.io/ - https://www.prefect.io/ - https://metaflow.org/ - https://zenml.io/home
-
The Why and How of Dagster User Code Deployment Automation
In Helm terms: there are 2 charts, namely the system: dagster/dagster (values.yaml), and the user code: dagster/dagster-user-deployments (values.yaml). Note that you have to set dagster-user-deployments.enabled: true in the dagster/dagster values-yaml to enable this.
What are some alternatives?
phidata - Agno is a lightweight framework for building multi-modal Agents [Moved to: https://github.com/agno-agi/agno]
Mage - 🧙 The modern replacement for Airflow. Mage is an open-source data pipeline tool for transforming and integrating data. https://github.com/mage-ai/mage-ai
tree-of-thought-llm - [NeurIPS 2023] Tree of Thoughts: Deliberate Problem Solving with Large Language Models
Prefect - The easiest way to build, run, and monitor data pipelines at scale.
awesome-pipeline - A curated list of awesome pipeline toolkits inspired by Awesome Sysadmin
Airflow - Apache Airflow - A platform to programmatically author, schedule, and monitor workflows
modelfusion - The TypeScript library for building AI applications.
MLflow - Open source platform for the machine learning lifecycle
aipl - Array-Inspired Pipeline Language
airbyte - The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.
snowpark-python - Snowflake Snowpark Python API
fastapi - FastAPI framework, high performance, easy to learn, fast to code, ready for production