Python data-pipelines

Open-source Python projects categorized as data-pipelines Edit details

Top 5 Python data-pipeline Projects

  • dagster

    An orchestration platform for the development, production, and observation of data assets.

    Project mention: ETL advice appreciated | reddit.com/r/ETL | 2022-06-19

    If you want to schedule your ETL, you can do something basic using Windows Task Scheduler or use something fancy like a Python orchestration library like dagster. Dagster works on Windows OS which is probably your best bet as most/all other orchestration libraries wiith a scheduler dont work on Windows.

  • Activeloop Hub

    Dataset format for AI. Build, manage, query & visualize datasets for deep learning. Stream data real-time to PyTorch/TensorFlow & version-control it. https://activeloop.ai (by activeloopai)

    Project mention: [Q] where to host 50GB dataset (for free?) | reddit.com/r/datasets | 2022-06-25

    Hey u/platoTheSloth, as u/gopietz mentioned (thanks a lot for the shout-out!!!), you can share them with the general public through uploading to Activeloop Platform (for researchers, we offer special terms, but even as a general public member you get up to 300GBs of free storage!). Thanks to our open source dataset format for AI, Hub, anyone can load the dataset in under 3seconds with one line of code, and stream it while training in PyTorch/TensorFlow.

  • SonarQube

    Static code analysis for 29 languages.. Your projects are multi-language. So is SonarQube analysis. Find Bugs, Vulnerabilities, Security Hotspots, and Code Smells so you can release quality code every time. Get started analyzing your projects today for free.

  • orchest

    Build data pipelines, the easy way 🛠️

    Project mention: How are you guys validating your data? | reddit.com/r/dataengineering | 2022-06-09

    +1 on a lightweight version of GE to more easily make part of an existing pipeline. Would like it for internal use (our data pipelines), but also for our open source users (https://github.com/orchest/orchest).

  • patterns-devkit

    Data pipelines from re-usable components

  • dbt-data-reliability

    Data anomalies monitoring as dbt tests and dbt artifacts uploader.

    Project mention: Launch HN: Elementary (YC W22) – Open-source data observability | news.ycombinator.com | 2022-03-04

    For any dbt users, their reliability package has the best and most comprehensive way to upload artifacts directly to the warehouse after a dbt invocation.

    https://github.com/elementary-data/dbt-data-reliability

NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020). The latest post mention was on 2022-06-25.

Python data-pipelines related posts

Index

What are some of the best open-source data-pipeline projects in Python? This list will help you:

Project Stars
1 dagster 4,908
2 Activeloop Hub 4,633
3 orchest 3,042
4 patterns-devkit 74
5 dbt-data-reliability 36
Find remote jobs at our new job board 99remotejobs.com. There are 4 new remote jobs listed recently.
Are you hiring? Post a new remote job listing for free.
Less time debugging, more time building
Scout APM allows you to find and fix performance issues with no hassle. Now with error monitoring and external services monitoring, Scout is a developer's best friend when it comes to application development.
scoutapm.com