dagster
grouparoo
DISCONTINUED
Our great sponsors
dagster | grouparoo | |
---|---|---|
46 | 27 | |
9,939 | 607 | |
4.7% | - | |
10.0 | 9.9 | |
6 days ago | almost 2 years ago | |
Python | JavaScript | |
Apache License 2.0 | MIT License |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
dagster
-
The Dagster Master Plan
I found this example that helped me - https://github.com/dagster-io/dagster/tree/master/examples/project_fully_featured/project_fully_featured
In the meantime, we're collecting solutions and use cases in our GitHub Discussions, and you're welcome to ask any specific questions in there!
-
What are some open-source ML pipeline managers that are easy to use?
I would recommend the following: - https://www.mage.ai/ - https://dagster.io/ - https://www.prefect.io/ - https://metaflow.org/ - https://zenml.io/home
-
Best Orchestration Tool to run dbt projects?
Dagster seemed really cool when I looked into it as an alternative to airflow. I especially like the software defined assets and built-in lineage which I haven't seen in any other tool. However it seems it does not support RBAC which is a pretty big issue if you want a self-service type of architecture, see https://github.com/dagster-io/dagster/issues/2219. It does seem like it's available in their hosted version, but I wanted to run it myself on k8s.
-
dbt Cloud Alternatives?
Dagster? https://dagster.io
-
What's the best thing/library you learned this year ?
One that I haven't seen on here yet: dagster
- Can we take a moment to appreciate how much of dataengineering is open source?
-
Dagger Python SDK: Develop Your CI/CD Pipelines as Code
I wondered how it related to https://dagster.io/
-
Data Engineer Github Profile?
You can find all current, closed, and resolved issues on the “Issues” section and explore them using filters: eg issues for dagster. Look into some of the issues and feel free to ask a question or post your idea: it’s much less toxic here (compared to SO, for example).
-
[D] Should I go with Prefect, Argo or Flyte for Model Training and ML workflow orchestration?
You could also consider Dagster, which aims to improve Apache Airflow's shortcomings. Also, take a look at MyMLOps, where you can get a quick overview of open-source orchestration tools.
grouparoo
-
Reference Data Stack for Data-Driven Startups
There are other tools that we will have to adopt in the future but haven’t yet due to lack of necessity. Specifically, one category that is popular in modern data stacks is Reverse ETL (Hightouch, Census, or Grouparoo). We currently don’t have a usecase for piping data back into 3rd party tools but it will definitely come up in the future.
-
Data pipeline suggestions
Reverse ETL: Grouparoo, Castled
-
Where can I find free data engineering ( big data) projects online?
Ingestion / ETL: Airbyte, Singer, Jitsu Transformation: dbt Orchestration: Airflow, Dagster Testing: GreatExpectations Observability: Monosi Reverse ETL: Grouparoo, Castled Visualization: Lightdash, Superset
-
Ask HN: Who is hiring? (December 2021)
Grouparoo | Remote (US) | Remote-OK | https://www.grouparoo.com
Grouparoo is a venture-backed software company building open source data tools that make data reliable, accessible, and actionable. We’re empowering teams to make great customer experiences, driven by data. While engineering teams have gotten good at storing and generating data about their customers, it’s rare that this data is used to its full potential in external applications. Grouparoo makes these integrations easy by providing a framework for defining your customer data and reliably syncing it to external tools.
To learn more about who we are, our engineering culture, and whether this is the right place for you, read our Key Values profile: https://www.keyvalues.com/grouparoo
Here are our open roles:
- Senior Backend / Lead Engineer: https://jobs.lever.co/grouparoo/6ba485d1-a5a4-41f0-9fa5-920a...
- Developer Advocate: https://jobs.lever.co/grouparoo/5e1531b4-7ec8-4c10-8e52-fc23...
Tech Stack: TypeScript / Javascript / Node.js, ActionHero, React + Next.js, Postgres & Redis, and whole lot of third-party APIs!
-
Launch HN: Hightouch (YC S19) – Sync data from data warehouses to SaaS tools
Congrats on the launch! Hightouch looks great and this need is real. Things seem to be going well, so I don't think I'm taking too much away by mentioning that we have been been working on Grouparoo, an open source alternative that solves similar pain points.
A few differences: git developer workflow focused (branches, CI, PRs, etc), ability to self host, segmentation in destinations (tagging people in mailchimp based on rules, for example)
-
Ask HN: Who is hiring? (August 2021)
Grouparoo | Remote (US) | Remote-OK | https://www.grouparoo.com
Grouparoo is a venture-backed software company building the open-source reverse-ETL framework that makes it easy to have meaningful, data-driven conversations with customers. Do you want to keep product data in-sync with tools like Hubspot, Marketo or Zendesk? Do you want to be able to build, test, and deploy data sync code just like the rest of your tech stack? That’s the kind of thing Grouparoo does.
We started Grouparoo because we are done saying “no” to marketing teams asking for data and want make is easy (and safe!) for everyone to us the data available at work. We are looking for a seasoned back-end engineer to join our US-based, fully remote team. The main components of our stack are Typescript/Javascript, Actionhero, Next.js, and React. Learn more about the position @ https://www.grouparoo.com/jobs and https://www.keyvalues.com/grouparoo. Check out our open-source framework (and see what you will be working on) @ https://github.com/grouparoo/grouparoo
-
Ask HN: Who is hiring? (July 2021)
Grouparoo | Remote (US) | Remote-OK | https://www.grouparoo.com
Grouparoo is a venture-backed software company building the open-source reverse-ETL framework that makes it easy to have meaningful, data-driven conversations with customers. Do you want to keep product data in-sync with tools like Hubspot, Marketo or Zendesk? Do you want to be able to build, test, and deploy data sync code just like the rest of your stack? That’s the kind of thing Grouparoo does.
We started Grouparoo because we are done saying “no” to marketing teams asking for data and want make is easy (and safe!) for everyone to us the data available at work. We are looking for 2 seasoned engineers to join our US-based, fully remote team. The main components of our stack are Typescript/Javascript, Actionhero, Next.js, and React. Learn more about the positions @ https://www.grouparoo.com/jobs and https://www.keyvalues.com/grouparoo. Check out our open-source framework (and see what you will be working on) @ https://github.com/grouparoo/grouparoo
Here are our open roles:
* Senior Backend / Founding Engineer: https://jobs.lever.co/grouparoo/6ba485d1-a5a4-41f0-9fa5-920a...
* Senior Full Stack / Lead Engineer: https://jobs.lever.co/grouparoo/946e3407-6101-45f1-84a8-135d...
* Founding Community Manager / Developer Advocate: https://jobs.lever.co/grouparoo/19ef1a6b-6ad9-49f6-8512-90e3...
Tech Stack: TypeScript / Javascript / Node.js, ActionHero, React + Next.js, Postgres & Redis, and whole lot of third-party APIs!
-
Bundling and Distributing Next.js Sites via NPM
The final thing we learned is that while the contents of the .next directory are needed for your visitors, not everything is needed. We saw that we were shipping 300mb packages to NPM for our Next.js UIs. We dug into the .next folder and learned that if you opt-into Webpack v5 for your Next.js site, large .next/cache/*.pack files will be created to speed up how Webpack works. This is normal behavior, but we were inadvertently publishing these large files to NPM! We added the .next/cache/* directory to our .npmignore and our build sizes went down to a more reasonable 20mb.
-
Using Typescript to create a Robust API between your frontend and backend
The Grouparoo Application is stored in a monorepo, which means that the frontend and backend code always exist side-by-side. This means that we can reference the API code from our Frontend code, and make a helper to check our response types. We don't need our API code at run-time, but we can import the types from it as we develop and compile the app to Javascript.
-
Deferring Side-Effects in Node.js until the End of a Transaction
Looking deeper into how cls-hooked works, we can see that it is possible to tell if you are currently in a namespace, and to set and get values from the namespace. Think of this like a session... but for the callback or promise your code is within! With this in mind, we can write our run method to be transaction-aware. This means that we can use a pattern that knows to run a function in-line if we aren’t within a transaction, but if we are, defer it until the end. We’ve wrapped utilities to do this within Grouparoo’s CLS module.
What are some alternatives?
Prefect - The easiest way to build, run, and monitor data pipelines at scale.
Airflow - Apache Airflow - A platform to programmatically author, schedule, and monitor workflows
Mage - 🧙 The modern replacement for Airflow. Mage is an open-source data pipeline tool for transforming and integrating data. https://github.com/mage-ai/mage-ai
airbyte - The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.
MLflow - Open source platform for the machine learning lifecycle
meltano
OpenLineage - An Open Standard for lineage metadata collection
streamlit - Streamlit — A faster way to build and share data apps.
ploomber - The fastest ⚡️ way to build data pipelines. Develop iteratively, deploy anywhere. ☁️
superset - Apache Superset is a Data Visualization and Data Exploration Platform
fastapi - FastAPI framework, high performance, easy to learn, fast to code, ready for production
hashi-ui - A modern user interface for @hashicorp Consul & Nomad