getting-started
windmill
Our great sponsors
getting-started | windmill | |
---|---|---|
16 | 86 | |
1,220 | 8,518 | |
0.1% | 4.8% | |
0.0 | 10.0 | |
about 1 year ago | 5 days ago | |
Makefile | Svelte | |
- | GNU General Public License v3.0 or later |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
getting-started
-
Why do companies still build data ingestion tooling instead of using a third-party tool like Airbyte?
Coincidently, I saw a presentation today on a nice half-way-house solution: using embeddable Python libraries like Sling and dlt - both open-source. See https://www.youtube.com/watch?v=gAqOLgG2iYY There is also singer.io which is more of a protocol than a library, but can also be installed although it looks like it is a true community effort and not so well maintained.
-
Data sources episode 2: AWS S3 to Postgres Data Sync using Singer
Singer is an open-source framework for data ingestion, which provides a standardized way to move data between various data sources and destinations (such as databases, APIs, and data warehouses). Singer offers a modular approach to data extraction and loading by leveraging two main components: Taps (data extractors) and Targets (data loaders). This design makes it an attractive option for data ingestion for several reasons:
- Design patter for Python ETL
-
Launch HN: Patterns (YC S21) – A much faster way to build and deploy data apps
Thanks for chipping in.
I’ve been leaning towards this direction. I think I/O is the biggest part that in the case of plain code steps still needs fixing. Input being data/stream and parameterization/config and output being some sort of typed data/stream.
My “let’s not reinvent the wheel” alarm is going of when I write that though. Examples that come to mind are text based (Unix / https://scale.com/blog/text-universal-interface) but also the Singer tap protocol (https://github.com/singer-io/getting-started/blob/master/doc...). And config obviously having many standard forms like ini, yaml, json, environment key value pairs and more.
At the same time, text feels horribly inefficient as encoding for some of the data objects being passed around in these flows. More specialized and optimized binary formats come to mind (Arrow, HDF5, Protobuf).
Plenty of directions to explore, each with their own advantages and disadvantages. I wonder which direction is favored by users of tools like ours. Will be good to poll (do they even care?).
PS Windmill looks equally impressive! Nice job
-
After Airflow. Where next for DE?
Mage uses the Singer Spec (https://github.com/singer-io/getting-started/blob/master/docs/SPEC.md), the data engineer community standard for building data integrations. This was created by Stitch and is widely adopted.
-
Basic data engineering question.
I like the Singer Protocol, and the various tools that use it. These include meltano, airbyte, stitch, pipelinewise, and a few others
-
I have hundreds of API data endpoints with different schemas. How do I organize?
Have you looked into using a dedicated data integration tool? Have you heard of Singer and the Singer Spec? https://github.com/singer-io/getting-started/blob/master/docs/SPEC.md
-
CDC (Change Data Capture) with 3rd party APIs
Or you could build your own such system and run it on Airflow, Prefect, Dagster, etc. Check out the Singer project for a suite of Python packages designed for such a task. Quality varies greatly, though.
-
Questions about Integration Singer Specification with AWS Glue
Our team is building out a data platform on AWS glue, and we pull from a variety of data sources including application databases and third party SaaS APIs. I have been looking into ways to standardize pulling data from different sources. The other day I came across the [Singer Specification](https://github.com/singer-io/getting-started) and was interested learning more about it. If anyone has experience working with Singer specifications, I would love to hear more about:
-
Anybody have experience creating singer taps and targets?
I just read the readme of the Singer getting started repo and am excited to write my first tap! I’m thinking instead of writing a new Airflow DAG whenever I want to pipe API data into our data warehouse I could write a singer tap and use Stitch instead. Is that a stupid idea?
windmill
-
Show HN: Strada – Cloud IDE for Connecting SaaS APIs
Look very similar to the script builder portion of https://github.com/windmill-labs/windmill, but not open-source, not self-hostable, and without open-source integrations (https://hub.windmill.dev/)
disclaimer: I'm founder of ^
- Ask HN: Is There a Zapier for APIs?
-
Postgres as Queue
If you need a job queue on Postgres, https://windmill.dev provide an all-integrated developer platform with a Pg queue at its core that support jobs defined in python/typescript/sql
-
A list of SaaS, PaaS and IaaS offerings that have free tiers of interest to devops and infradev
windmill.dev - Windmill is an open-source developer platform to quickly build production-grade multi-step automation and internal apps from minimal Python and Typescript scripts. As a free user, you can create and be a member of at most three non-premium workspaces.
-
Airplane acquired by Airtable and is shutting down
For an alternative to airplane.dev, you can checkout Windmill.
https://github.com/windmill-labs/windmill
"Open-source developer infrastructure for internal tools (APIs, background jobs, workflows and UIs). Self-hostable alternative to Airplane, Pipedream, Superblocks and a simplified Temporal with autogenerated UIsm and custom UIs to trigger workflows and scripts as internal apps.
Scripts are turned into sharable UIs automatically, and can be composed together into flows or used into richer apps built with low-code. Supported script languages supported are: Python, TypeScript, Go, Bash, SQL, and GraphQL. "
If you search HN, you'll find the creator of Windmill comment on comparisons to airplane.dev:
https://hn.algolia.com/?dateRange=all&page=0&prefix=false&qu...
-
Pipe Dreams: The life and times of Yahoo Pipes
https://windmill.dev is a self-hostable OSS alternative to pipedream
(disclaimer: I'm founder)
-
Looking for an e-commerce multivendor platform for 10million+ products
I'm genuinely curious what server-side stuff on BC you are referring to. That may have been something added after our assessment. The way I'd generally approach something like that for any of the platforms would be using an external low/no code solution to process webhook data. But it would depend heavily on the use case. For a more developer friendly option I've been really impressed by windmill.dev. We use a mix of n8n and windmill for various needs.
- Deno Cron
-
Show HN: Windmill – fastest open-source workflow engine – the how
Yes it goes in that direction, however note that you can already do this in a not too hard way.
Our openflow spec is both open-source and has a full openapi definition: https://github.com/windmill-labs/windmill/blob/main/openflow...
you can use that to generate client sdks in any languages and build your own dag with it. That's what one of our customer did building a reactflow to openflow library: https://github.com/Devessier/reactflow-to-windmill
It's not as good as the decorator way but we move fast and if you still have interest for it we could prioritize it (and ask for feedbacks :))
-
GitHub Actions Are a Problem
We have built an open-source generic workflow engine to run arbitrary scripts (https://windmill.dev) with a vscode extension to build the yaml using a low-code builder and each individual script in their dedicated python/ts files so you get your full editor assistants https://youtu.be/aSOF6AzyDr8?t=116
One of the area we are expanding next is a github app so you get exactly the same UX as github actions but running windmill workflows on your windmill workers.
What are some alternatives?
airbyte - The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.
automatisch - The open source Zapier alternative. Build workflow automation without spending time and money.
AWS Data Wrangler - pandas on AWS - Easy integration with Athena, Glue, Redshift, Timestream, Neptune, OpenSearch, QuickSight, Chime, CloudWatchLogs, DynamoDB, EMR, SecretManager, PostgreSQL, MySQL, SQLServer and S3 (Parquet, CSV, JSON and EXCEL).
plasmic - Visual builder for React. Build apps, websites, and content. Integrate with your codebase.
meltano
budibase - Budibase is an open-source low code platform that helps you build internal tools in minutes 🚀
tap-hubspot
supabase - The open source Firebase alternative.
Mage - 🧙 The modern replacement for Airflow. Mage is an open-source data pipeline tool for transforming and integrating data. https://github.com/mage-ai/mage-ai
pg_jsonschema - PostgreSQL extension providing JSON Schema validation
tap-spreadsheets-anywhere
llvm-project - The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.