SaaSHub helps you find the best software and product alternatives Learn more →
Top 23 Python Pipeline Projects
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
-
airbyte
The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.
-
Kedro
Kedro is a toolbox for production-ready data science. It uses software engineering best practices to help you create data engineering and data science pipelines that are reproducible, maintainable, and modular.
-
Mage
🧙 The modern replacement for Airflow. Mage is an open-source data pipeline tool for transforming and integrating data. https://github.com/mage-ai/mage-ai
-
WorkOS
The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
-
towhee
Towhee is a framework that is dedicated to making neural data processing pipelines simple and fast.
-
mara-pipelines
A lightweight opinionated ETL framework, halfway between plain scripts and Apache Airflow
-
toil
A scalable, efficient, cross-platform (Linux/macOS) and easy-to-use workflow engine in pure Python.
-
NeumAI
Neum AI is a best-in-class framework to manage the creation and synchronization of vector embeddings at large scale.
-
pypyr automation task runner
pypyr task-runner cli & api for automation pipelines. Automate anything by combining commands, different scripts in different languages & applications into one pipeline process.
-
aws-lambda-handler-cookbook
This repository provides a working, deployable, open source-based, serverless service template with an AWS Lambda function and AWS CDK Python code with all the best practices and a complete CI/CD pipeline.
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
Project mention: Prefect: A workflow orchestration tool for data pipelines | news.ycombinator.com | 2024-03-13
Project mention: Launch HN: Bracket (YC W22) – Two-Way Sync Between Salesforce and Postgres | news.ycombinator.com | 2023-12-12I'l also give a shout-out to Airbyte (https://airbyte.com/), with which I've had some limited success with integrating Salesforce to a local database. The particular pull for Airbyte is that we can self-host the open source version, rather than pay Fivetran a significant sum to do this for us.
It's an immature tool, so I don't yet know that I can claim we've spent _less_ than Fivetran on the additional engineering and ops time, but it feels like it has potential to do so once stabilized.
Interesting, thanks for sharing. I'll definitely take a look, although at this point I am so comfortable with Snakemake, it is a bit hard to imagine what would convince me to move to another tool. But I like the idea of composable pipelines: I am building a tool (too early to share) that would allow to lay Snakemake pipelines on top of each other using semi-automatic data annotations similar to how it is done in kedro (https://github.com/kedro-org/kedro).
I’ve been working in tech for more than five years. I started as a Data Scientist, and now I’m exploring and loving the DevRel 🥑 role for Taipy. Needless to say, evolving in the tech scene has been a ride full of ups, downs, and everything in between.
Project mention: Spreadsheet errors can have disastrous consequences – yet we keep making them | news.ycombinator.com | 2024-01-25Pandas docs > Comparison with spreadsheets: https://pandas.pydata.org/docs/getting_started/comparison/co...
Pandas docs > I/O > Excel files: https://pandas.pydata.org/docs/user_guide/io.html#excel-file...
nteract/papermill: https://github.com/nteract/papermill :
> papermill is a tool for parameterizing, executing, and analyzing Jupyter Notebooks. [...]
> This opens up new opportunities for how notebooks can be used. For example:
> - Perhaps you have a financial report that you wish to run with different values on the first or last day of a month or at the beginning or end of the year, using parameters makes this task easier.
"The World Excel Championship is being broadcast on ESPN" (2022) https://news.ycombinator.com/item?id=32420925 :
> Computational notebook speedrun ideas:
Project mention: Python: Uncovering the Overlooked Core Functionalities | news.ycombinator.com | 2023-07-24If you actually think this code is better there's a real library that does this: https://github.com/EntilZha/PyFunctional.
Maybe it would help you to look at the galaxy project: GitHub main site
Project mention: Show HN: Hatchet – Open-source distributed task queue | news.ycombinator.com | 2024-03-08a little late now, but I wonder if https://github.com/DataBiosphere/toil might meet your requirements
Project mention: Show HN: Neum AI – Open-source large-scale RAG framework | news.ycombinator.com | 2023-11-21Interesting to see that the semantic chunking in the tools library is a wrapper around GPT-4. Asks GPT for the python code and executes it: https://github.com/NeumTry/NeumAI/blob/main/neumai-tools/neu...
Project mention: Serverless APIs: Auto-Generate OpenAPI Docs & CI/CD Protections | dev.to | 2024-03-04In case you didn’t know, the Cookbook is a template project that allows you to get started with serverless with three clicks, and it has all the best practices and utilities that a production-grade serverless service requires.
Here's the project: https://github.com/vmware/versatile-data-kit
Python Pipeline related posts
- Launch HN: Bracket (YC W22) – Two-Way Sync Between Salesforce and Postgres
- Simple task runner for automation pipelines
- 25 million Creative Commons image dataset released!
- Nextflow: Data-Driven Computational Pipelines
- Airbyte API and Terraform Provider – available in open source
- Need help moving 16gb of mongodb data to tableau
- Python: Uncovering the Overlooked Core Functionalities
-
A note from our sponsor - SaaSHub
www.saashub.com | 24 Apr 2024
Index
What are some of the best open-source Pipeline projects in Python? This list will help you:
Project | Stars | |
---|---|---|
1 | jina | 20,009 |
2 | Prefect | 14,586 |
3 | airbyte | 13,923 |
4 | great_expectations | 9,440 |
5 | Kedro | 9,353 |
6 | Taipy | 8,371 |
7 | Mage | 7,001 |
8 | papermill | 5,623 |
9 | pipelines | 3,436 |
10 | towhee | 2,970 |
11 | PyFunctional | 2,332 |
12 | mara-pipelines | 2,054 |
13 | pytorch-toolbelt | 1,483 |
14 | MLBox | 1,475 |
15 | galaxy | 1,313 |
16 | sematic | 941 |
17 | toil | 869 |
18 | NeumAI | 774 |
19 | pypyr automation task runner | 568 |
20 | aws-lambda-handler-cookbook | 450 |
21 | versatile-data-kit | 410 |
22 | karton | 366 |
23 | fluids | 335 |
Sponsored