Top 17 Python Pipeline Projects
Always know what to expect from your data.Project mention: Just starting to get into automated testing, should I be looking for a dedicated tool or library for data engineering specifically? | reddit.com/r/dataengineering | 2021-05-14
You can also extend existing framework/s if necessary. For example, both pytest and Great Expectations are extensible.
📚 Parameterize, execute, and analyze notebooksProject mention: Team with no data science infrastructure/knowledge (crawl/walk/run) | reddit.com/r/datascience | 2021-06-03
You can look into papermill for example
Scout APM: A developer's best friend. Try free for 14-days. Scout APM uses tracing logic that ties bottlenecks to source code so you know the exact line of code causing performance issues and can get back to building a great product faster.
A Python framework for creating reproducible, maintainable and modular data science code.Project mention: I Started Streaming on Twitch | dev.to | 2021-06-12
It all started with kedro/issues/606, Yetu called out for users of kedro to record themselves doing a walk through of their tutorials. I wanted to do this, but was really stuck at the fact that recording or editing somewhat polished vide is quite time consuming for me.
Python library for creating data pipelines with chain functional programmingProject mention: PyFunctional makes creating data pipelines easy by using chained functional operators | reddit.com/r/Python | 2021-03-31
A lightweight opinionated ETL framework, halfway between plain scripts and Apache AirflowProject mention: Using PostgreSQL as a Data Warehouse | news.ycombinator.com | 2021-05-10
The tooling behind the approach has been built as a set of python package named Mara. It is available at GitHub:
And additional packages can be found at the Mara org:
Data intensive science for everyone. (by galaxyproject)Project mention: Developed a new kind of dual extruder system on fully custom built 3D printer | reddit.com/r/3Dprinting | 2021-03-01
Easy pipelines for pandas DataFrames.
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
MLOps tool for deploying machine learning projects to Kubernetes.Project mention: Deployment automation for ML projects of all shapes and sizes | news.ycombinator.com | 2021-06-09
pypyr task-runner cli & api for automation pipelines. Automate anything by combining commands, different scripts in different languages & applications into one pipeline process.
Forte is a flexible and powerful NLP builder FOR TExt. This is part of the CASL project: http://casl-project.ai/Project mention: Building Modular and Re-purposable NLP Pipelines | reddit.com/r/learnmachinelearning | 2021-03-02
Introducing Forte, from the CASL open-source project at Petuum. Forte combines multiple NLP tools to construct an entire NLP pipeline with a few lines of python and extend them to different domains.
image and animation processing frameworkProject mention: pierogis/pierogis a framework for image and animation processing | reddit.com/r/Python | 2021-02-22
Make Python code cooler. Less is more. (by abersheeran)Project mention: Simple, efficient and pure Python implementation of Python pipeline operations | reddit.com/r/Python | 2021-05-17
Pythonic task automationProject mention: Alkymi – Data/Task Automation in Python | reddit.com/r/programming | 2021-03-23
Spline is a tool that is capable of running locally as well as part of well known pipelines like Jenkins (Jenkinsfile), Travis CI (.travis.yml) or similar ones. (by Nachtfeuer)
bids application for processing functional MRI data, robust to scanner, acquisition and age variability.Project mention: Siemens output from ABCD T1 and T2 sequences. | reddit.com/r/neuroscience | 2021-02-08
Who provided the sequence? They're usually the point of contact for this kind of question. Alternatively, you can bug one of the processing groups for ABCD (link, and they might point you in the right direction. A shot of getting one of the ABCD or ABIDE/HCP sequence designers to see this on reddit is unlikley, but good luck.
Library for building Modular and Asynchronous Graphs with Directed and Acyclic edges (MAGDA)Project mention: MAGDA – our open-source solution for spaghetti code | dev.to | 2021-04-14
We would like to introduce you to our latest open-source library: MAGDA. The name is an abbreviation for “Modular Asynchronous Graphs with Directed and Acyclic edges”, which fully describes the idea behind it. The library enables building modular data pipelines with asynchronous processing in e.g. machine learning and data science projects. It is dedicated for Python projects and is available on the NeuroSYS GitHub, as well as on the PyPI repository. It aids our R&D teams not only by introducing some abstraction (classes and functions) but also by imposing an architectural pattern onto the project.
A framework for the translation of AI tools to the radiology environmentProject mention: Any interesting open projects to join? Or anyone want with some good ideas want to start one? | reddit.com/r/Python | 2021-02-05
What are some of the best open-source Pipeline projects in Python? This list will help you:
|9||pypyr automation task runner||189|