Python Pipeline

Open-source Python projects categorized as Pipeline

Top 23 Python Pipeline Projects

  • jina

    ☁️ Build multimodal AI applications with cloud-native stack

  • Project mention: Jina.ai: Self-host Multimodal models | news.ycombinator.com | 2024-01-26
  • Prefect

    The easiest way to build, run, and monitor data pipelines at scale.

  • Project mention: Prefect: A workflow orchestration tool for data pipelines | news.ycombinator.com | 2024-03-13
  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • airbyte

    The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.

  • Project mention: Launch HN: Bracket (YC W22) – Two-Way Sync Between Salesforce and Postgres | news.ycombinator.com | 2023-12-12

    I'l also give a shout-out to Airbyte (https://airbyte.com/), with which I've had some limited success with integrating Salesforce to a local database. The particular pull for Airbyte is that we can self-host the open source version, rather than pay Fivetran a significant sum to do this for us.

    It's an immature tool, so I don't yet know that I can claim we've spent _less_ than Fivetran on the additional engineering and ops time, but it feels like it has potential to do so once stabilized.

  • great_expectations

    Always know what to expect from your data.

  • Kedro

    Kedro is a toolbox for production-ready data science. It uses software engineering best practices to help you create data engineering and data science pipelines that are reproducible, maintainable, and modular.

  • Project mention: Nextflow: Data-Driven Computational Pipelines | news.ycombinator.com | 2023-08-10

    Interesting, thanks for sharing. I'll definitely take a look, although at this point I am so comfortable with Snakemake, it is a bit hard to imagine what would convince me to move to another tool. But I like the idea of composable pipelines: I am building a tool (too early to share) that would allow to lay Snakemake pipelines on top of each other using semi-automatic data annotations similar to how it is done in kedro (https://github.com/kedro-org/kedro).

  • Taipy

    Turns Data and AI algorithms into production-ready web applications in no time.

  • Project mention: +10 Resources to Empower Women in Technology | dev.to | 2024-03-06

    I’ve been working in tech for more than five years. I started as a Data Scientist, and now I’m exploring and loving the DevRel 🥑 role for Taipy. Needless to say, evolving in the tech scene has been a ride full of ups, downs, and everything in between.

  • Mage

    🧙 The modern replacement for Airflow. Mage is an open-source data pipeline tool for transforming and integrating data. https://github.com/mage-ai/mage-ai

  • Project mention: FLaNK AI-April 22, 2024 | dev.to | 2024-04-22
  • WorkOS

    The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.

    WorkOS logo
  • papermill

    📚 Parameterize, execute, and analyze notebooks

  • Project mention: Spreadsheet errors can have disastrous consequences – yet we keep making them | news.ycombinator.com | 2024-01-25

    Pandas docs > Comparison with spreadsheets: https://pandas.pydata.org/docs/getting_started/comparison/co...

    Pandas docs > I/O > Excel files: https://pandas.pydata.org/docs/user_guide/io.html#excel-file...

    nteract/papermill: https://github.com/nteract/papermill :

    > papermill is a tool for parameterizing, executing, and analyzing Jupyter Notebooks. [...]

    > This opens up new opportunities for how notebooks can be used. For example:

    > - Perhaps you have a financial report that you wish to run with different values on the first or last day of a month or at the beginning or end of the year, using parameters makes this task easier.

    "The World Excel Championship is being broadcast on ESPN" (2022) https://news.ycombinator.com/item?id=32420925 :

    > Computational notebook speedrun ideas:

  • pipelines

    Machine Learning Pipelines for Kubeflow

  • towhee

    Towhee is a framework that is dedicated to making neural data processing pipelines simple and fast.

  • Project mention: FLaNK Stack Weekly for 14 Aug 2023 | dev.to | 2023-08-14
  • PyFunctional

    Python library for creating data pipelines with chain functional programming

  • Project mention: Python: Uncovering the Overlooked Core Functionalities | news.ycombinator.com | 2023-07-24

    If you actually think this code is better there's a real library that does this: https://github.com/EntilZha/PyFunctional.

  • mara-pipelines

    A lightweight opinionated ETL framework, halfway between plain scripts and Apache Airflow

  • pytorch-toolbelt

    PyTorch extensions for fast R&D prototyping and Kaggle farming

  • MLBox

    MLBox is a powerful Automated Machine Learning python library.

  • galaxy

    Data intensive science for everyone.

  • Project mention: Need for GUIs for bioinformatic tools? | /r/bioinformatics | 2023-06-17

    Maybe it would help you to look at the galaxy project: GitHub main site

  • sematic

    An open-source ML pipeline development platform

  • toil

    A scalable, efficient, cross-platform (Linux/macOS) and easy-to-use workflow engine in pure Python.

  • Project mention: Show HN: Hatchet – Open-source distributed task queue | news.ycombinator.com | 2024-03-08

    a little late now, but I wonder if https://github.com/DataBiosphere/toil might meet your requirements

  • NeumAI

    Neum AI is a best-in-class framework to manage the creation and synchronization of vector embeddings at large scale.

  • Project mention: Show HN: Neum AI – Open-source large-scale RAG framework | news.ycombinator.com | 2023-11-21

    Interesting to see that the semantic chunking in the tools library is a wrapper around GPT-4. Asks GPT for the python code and executes it: https://github.com/NeumTry/NeumAI/blob/main/neumai-tools/neu...

  • pypyr automation task runner

    pypyr task-runner cli & api for automation pipelines. Automate anything by combining commands, different scripts in different languages & applications into one pipeline process.

  • Project mention: Simple task runner for automation pipelines | news.ycombinator.com | 2023-11-03
  • aws-lambda-handler-cookbook

    This repository provides a working, deployable, open source-based, serverless service template with an AWS Lambda function and AWS CDK Python code with all the best practices and a complete CI/CD pipeline.

  • Project mention: Serverless APIs: Auto-Generate OpenAPI Docs & CI/CD Protections | dev.to | 2024-03-04

    In case you didn’t know, the Cookbook is a template project that allows you to get started with serverless with three clicks, and it has all the best practices and utilities that a production-grade serverless service requires.

  • versatile-data-kit

    One framework to develop, deploy and operate data workflows with Python and SQL.

  • Project mention: Looking for a data blogger | /r/opensource | 2023-05-19

    Here's the project: https://github.com/vmware/versatile-data-kit

  • karton

    Distributed malware processing framework based on Python, Redis and S3.

  • Project mention: Advices for an automated malware analysis lab project | /r/Malware | 2023-07-11
  • fluids

    Fluid dynamics component of Chemical Engineering Design Library (ChEDL)

  • SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

    SaaSHub logo
NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020).

Python Pipeline related posts

Index

What are some of the best open-source Pipeline projects in Python? This list will help you:

Project Stars
1 jina 20,009
2 Prefect 14,586
3 airbyte 13,923
4 great_expectations 9,440
5 Kedro 9,353
6 Taipy 8,371
7 Mage 7,001
8 papermill 5,623
9 pipelines 3,436
10 towhee 2,970
11 PyFunctional 2,332
12 mara-pipelines 2,054
13 pytorch-toolbelt 1,483
14 MLBox 1,475
15 galaxy 1,313
16 sematic 941
17 toil 869
18 NeumAI 774
19 pypyr automation task runner 568
20 aws-lambda-handler-cookbook 450
21 versatile-data-kit 410
22 karton 366
23 fluids 335

Sponsored
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com