fides
mara-pipelines
fides | mara-pipelines | |
---|---|---|
2 | 3 | |
328 | 2,053 | |
0.6% | 0.1% | |
9.8 | 6.0 | |
6 days ago | 5 months ago | |
Python | Python | |
Apache License 2.0 | MIT License |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
fides
-
What data governance tool are you folks using?
I’ve also been impressed with the approach of Fides, an open source privacy management framework that ties into ci/cd, though I haven’t used it myself yet. The thing about it that stood out was Fideslang, their language and taxonomy for representing data privacy primitives.
-
Privacy-as-Code: Preventing Facebook’s $5B violation using Fides Open-Source
Fides is built to solve for problems like this. In its current release, you can already draft a policy in YAML using fideslang and enforce that policy to ensure engineers across a team can’t accidentally or intentionally misuse data in a way that deviates from the promises a business or application makes to its users.
mara-pipelines
-
How to keep track of the different Transformations done in an ETL pipeline?
The closest I've found is Mara but not what I'm after.
-
Using PostgreSQL as a Data Warehouse
The tooling behind the approach has been built as a set of python package named Mara. It is available at GitHub:
https://github.com/mara/mara-pipelines
And additional packages can be found at the Mara org:
https://github.com/mara
-
Build your own “data lake” for reporting purposes
Minio and nifi, require machines by themselves. Better off pure python and if obe wants sonething lighweight and visually pleasing Mara [0] or Dagster with Dagit [1] will do the job
[0] https://github.com/mara/mara-pipelines
[1] https://docs.dagster.io/tutorial/execute
What are some alternatives?
fiftyone - The open-source tool for building high-quality datasets and computer vision models
abcd-hcp-pipeline - bids application for processing functional MRI data, robust to scanner, acquisition and age variability.
differential-privacy-library - Diffprivlib: The IBM Differential Privacy Library
kuwala - Kuwala is the no-code data platform for BI analysts and engineers enabling you to build powerful analytics workflows. We are set out to bring state-of-the-art data engineering tools you love, such as Airbyte, dbt, or Great Expectations together in one intuitive interface built with React Flow. In addition we provide third-party data into data science models and products with a focus on geospatial data. Currently, the following data connectors are available worldwide: a) High-resolution demographics data b) Point of Interests from Open Street Map c) Google Popular Times
dvc - 🦉 ML Experiments and Data Management with Git
pybaseball - Pull current and historical baseball statistics using Python (Statcast, Baseball Reference, FanGraphs)
datahub - The Metadata Platform for your Data Stack
dbt-core - dbt enables data analysts and engineers to transform their data using the same practices that software engineers use to build applications.
awesome-machine-unlearning - Awesome Machine Unlearning (A Survey of Machine Unlearning)
etl-markup-toolkit - ETL Markup Toolkit is a spark-native tool for expressing ETL transformations as configuration
pandas-datareader - Extract data from a wide range of Internet sources into a pandas DataFrame.
dremio-oss - Dremio - the missing link in modern data