Top 5 Python data-pipeline Projects
An orchestration platform for the development, production, and observation of data assets.Project mention: ETL advice appreciated | reddit.com/r/ETL | 2022-06-19
If you want to schedule your ETL, you can do something basic using Windows Task Scheduler or use something fancy like a Python orchestration library like dagster. Dagster works on Windows OS which is probably your best bet as most/all other orchestration libraries wiith a scheduler dont work on Windows.
Dataset format for AI. Build, manage, query & visualize datasets for deep learning. Stream data real-time to PyTorch/TensorFlow & version-control it. https://activeloop.ai (by activeloopai)Project mention: [Q] where to host 50GB dataset (for free?) | reddit.com/r/datasets | 2022-06-25
Hey u/platoTheSloth, as u/gopietz mentioned (thanks a lot for the shout-out!!!), you can share them with the general public through uploading to Activeloop Platform (for researchers, we offer special terms, but even as a general public member you get up to 300GBs of free storage!). Thanks to our open source dataset format for AI, Hub, anyone can load the dataset in under 3seconds with one line of code, and stream it while training in PyTorch/TensorFlow.
Static code analysis for 29 languages.. Your projects are multi-language. So is SonarQube analysis. Find Bugs, Vulnerabilities, Security Hotspots, and Code Smells so you can release quality code every time. Get started analyzing your projects today for free.
Build data pipelines, the easy way 🛠️Project mention: How are you guys validating your data? | reddit.com/r/dataengineering | 2022-06-09
+1 on a lightweight version of GE to more easily make part of an existing pipeline. Would like it for internal use (our data pipelines), but also for our open source users (https://github.com/orchest/orchest).
Data pipelines from re-usable components
Data anomalies monitoring as dbt tests and dbt artifacts uploader.Project mention: Launch HN: Elementary (YC W22) – Open-source data observability | news.ycombinator.com | 2022-03-04
For any dbt users, their reliability package has the best and most comprehensive way to upload artifacts directly to the warehouse after a dbt invocation.
Python data-pipelines related posts
[Q] where to host 50GB dataset (for free?)
1 project | reddit.com/r/datasets | 25 Jun 2022
ETL advice appreciated
1 project | reddit.com/r/ETL | 19 Jun 2022
Workflow automation for smaller use-cases
3 projects | reddit.com/r/learnpython | 22 Apr 2022
[N] [P] Access 100+ image, video & audio datasets in seconds with one line of code & stream them while training ML models with Activeloop Hub (more at docs.activeloop.ai, description & links in the comments below)
4 projects | reddit.com/r/MachineLearning | 17 Apr 2022
Thinking of making a switch from actuarial science to data engineering
1 project | reddit.com/r/dataengineering | 15 Apr 2022
Easy way to load, create, version, query and visualize computer vision datasets
1 project | news.ycombinator.com | 28 Mar 2022
Easy way to load, create, version, query & visualize machine learning datasets
1 project | reddit.com/r/learnmachinelearning | 28 Mar 2022
What are some of the best open-source data-pipeline projects in Python? This list will help you:
Are you hiring? Post a new remote job listing for free.