Ingest, store, & analyze all types of time series data in a fully-managed, purpose-built database. Keep data forever with low-cost storage and superior data compression. Learn more →
Top 23 Python Pipeline Projects
-
Project mention: Cross data type search that wasn’t supported well using Elasticsearch | /r/learnprogramming | 2023-04-11
Jina mainly because of their use of neural networks and AI.
-
-
Sonar
Write Clean Python Code. Always.. Sonar helps you commit clean code every time. With over 225 unique rules to find Python bugs, code smells & vulnerabilities, Sonar finds the issues while you focus on the work.
-
airbyte
Data integration platform for ELT pipelines from APIs, databases & files to warehouses & lakes.
Project mention: airbyte VS cloudquery - a user suggested alternative | libhunt.com/r/airbyte | 2023-06-02 -
Project mention: Data Quality at Scale with Great Expectations, Spark, and Airflow on EMR | dev.to | 2023-04-24
Great Expectations (GE) is an open-source data validation tool that helps ensure data quality.
-
# pyproject.toml [project] dependencies = [ "kedro @ git+https://github.com/kedro-org/[email protected]", "kedro-datasets[pandas.CSVDataSet,polars.CSVDataSet] @ git+https://github.com/kedro-org/[email protected]#subdirectory=kedro-datasets", ]
-
Project mention: Show HN: Mercury – convert Jupyter Notebooks to Web Apps without code rewriting | news.ycombinator.com | 2023-06-02
I'm using Papermill to operationalize Notebooks (https://github.com/nteract/papermill), it e.g. also has airflow support. I'm really happy with papermill for automatic notebook execution, in my field it's nice that we can go very quickly from analysis to operations -- while having super transparent "logging" in the executed notebooks.
-
-
InfluxDB
Access the most powerful time series database as a service. Ingest, store, & analyze all types of time series data in a fully-managed, purpose-built database. Keep data forever with low-cost storage and superior data compression.
-
towhee
Towhee is a framework that is dedicated to making neural data processing pipelines simple and fast.
Project mention: Welcome to generate your embeddings with Towhee | news.ycombinator.com | 2023-04-20 -
-
mara-pipelines
A lightweight opinionated ETL framework, halfway between plain scripts and Apache Airflow
-
-
Project mention: Surrender cards are being distributed to Russianforces: 'Your ticket to a peaceful life. Show this card to a Ukrainian soldier - it will save your life and help you get back home' On the back: a telegram chat & phone number they can contact 'to receive detailed support'. | /r/ukraine | 2022-09-12
For what it's worth, the QR code points to a URI shortener domain "qrfy.mobi" which redirects to a cloudflare hosted page that contains telegram information relating to "nikolay_bodenko" and "chaos_admin" ("For advertising and cooperation"? Whatever that is supposed to mean), which led me to "rf200_now" and "rf200_nooow" (looks to get spammed around a bit), which is purportedly supportive of Ukraine.
-
-
toil
A scalable, efficient, cross-platform (Linux/macOS) and easy-to-use workflow engine in pure Python.
-
pypyr automation task runner
pypyr task-runner cli & api for automation pipelines. Automate anything by combining commands, different scripts in different languages & applications into one pipeline process.
-
-
-
Here's the project: https://github.com/vmware/versatile-data-kit
-
-
Project mention: AbaCalc - Calculator for engineers, students & technicians | /r/react | 2022-08-02
I’m doing similar things using an api with some Python tools in the background to do the heavy lifting with units and thermo/chemical calcs using the Caleb Bell’s libraries for thermo and fluids
-
aws-lambda-handler-cookbook
This repository provides a working, deployable, open source based, AWS Lambda handler and CDK Python code. This handler embodies Serverless best practices and has all the bells and whistles for a proper production ready handler.
Project mention: AWS Lambda Cookbook — Elevate your handler’s code — Part 4 — Environment Variables | dev.to | 2023-04-03This AWS CDK code defines the variables of the schema ‘MyHandlerEnvVars’ and sets their values. Look specifically at ‘__add_get_lambda_integration’ function.
-
forte
Forte is a flexible and powerful ML workflow builder. This is part of the CASL project: http://casl-project.ai/
-
Project mention: How would I start to recode an old python library only compatible with an older version of python (3.6) to become compatible with a newer python version (3.8+)? | /r/learnpython | 2022-11-06
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
Python Pipeline related posts
- Postgres to Postgres incremental (timestamp) sync etl tool
- How useful is Airbytes in production pipelines?
- Fondant: Easily build and share datasets for foundation model fine-tuning
- Do we need Spark if we're just loading data from source to target? Trying to move away from Informatica
- Test
- test
- Recommendation for Pipeline TD Projects for Junior Role.
-
A note from our sponsor - InfluxDB
www.influxdata.com | 7 Jun 2023
Index
What are some of the best open-source Pipeline projects in Python? This list will help you:
Project | Stars | |
---|---|---|
1 | jina | 18,477 |
2 | Prefect | 12,078 |
3 | airbyte | 10,796 |
4 | great_expectations | 8,410 |
5 | Kedro | 8,408 |
6 | papermill | 5,251 |
7 | pipelines | 3,205 |
8 | towhee | 2,250 |
9 | PyFunctional | 2,180 |
10 | mara-pipelines | 2,005 |
11 | MLBox | 1,425 |
12 | pytorch-toolbelt | 1,395 |
13 | galaxy | 1,109 |
14 | toil | 839 |
15 | pypyr automation task runner | 523 |
16 | whispers | 436 |
17 | bodywork | 421 |
18 | versatile-data-kit | 338 |
19 | karton | 320 |
20 | fluids | 275 |
21 | aws-lambda-handler-cookbook | 259 |
22 | forte | 220 |
23 | pipeline-live | 199 |