ghostpii_client
dagster
ghostpii_client | dagster | |
---|---|---|
3 | 46 | |
23 | 10,215 | |
- | 2.1% | |
1.1 | 10.0 | |
about 1 year ago | 6 days ago | |
Python | Python | |
MIT License | Apache License 2.0 |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
ghostpii_client
-
Help me spread the word, or at least play with a free toy
I am an entrepreneur trying to get a movement going to really start using this tech at big corporations to keep them out of trouble. I am guessing the conversation in here is a little more abstract than my usual day-to-day (although I am a reformed mathematician) but I wanted to introduce myself nonetheless.
If anybody is interested we maintain a software library, implemented in Python, that is designed to let relatively everyday people (software engineers, data scientists, etc.) use these privacy-enhancing techniques in a familiar interface without a rocket science course. If you go to the GitHub page I link below there is a Binder server where you can play with it right now via a Jupyter notebook over the web with basically no work or commitment.
https://github.com/capnion/ghostpii_client
I also put a ton of content out on LinkedIn, mostly oriented towards why businesses should adopt these things, what to do with them, and how they relate to other trends.
https://www.linkedin.com/in/alexander-c-mueller-phd-0272a6108/
I would greatly appreciate engagement of any kind: test-drivers, early-adopters, complainers, design feedback, likes, reshares, stars, emails. I am a true believer trying to this tech out where it can do some good and I need to spread the word.
- help me spread the word, or at least play with a free toy
dagster
- Experience with Dagster.io?
-
Dagster tutorials
My recommendation is to continue on with the tutorial, then look at one of the larger example projects especially the ones named “project_”, and you should understand most of it. Of what you don't understand and you're curious about, look into the relevant concept page for the functions in the docs.
-
The Dagster Master Plan
I found this example that helped me - https://github.com/dagster-io/dagster/tree/master/examples/project_fully_featured/project_fully_featured
-
What are some open-source ML pipeline managers that are easy to use?
I would recommend the following: - https://www.mage.ai/ - https://dagster.io/ - https://www.prefect.io/ - https://metaflow.org/ - https://zenml.io/home
-
The Why and How of Dagster User Code Deployment Automation
In Helm terms: there are 2 charts, namely the system: dagster/dagster (values.yaml), and the user code: dagster/dagster-user-deployments (values.yaml). Note that you have to set dagster-user-deployments.enabled: true in the dagster/dagster values-yaml to enable this.
-
Best Orchestration Tool to run dbt projects?
Dagster seemed really cool when I looked into it as an alternative to airflow. I especially like the software defined assets and built-in lineage which I haven't seen in any other tool. However it seems it does not support RBAC which is a pretty big issue if you want a self-service type of architecture, see https://github.com/dagster-io/dagster/issues/2219. It does seem like it's available in their hosted version, but I wanted to run it myself on k8s.
-
dbt Cloud Alternatives?
Dagster? https://dagster.io
-
What's the best thing/library you learned this year ?
One that I haven't seen on here yet: dagster
- Anyone have an example of a project where a handful of the more popular Python tools are used? (E.g. airbyte, airflow, dbt, and pandas)
- Can we take a moment to appreciate how much of dataengineering is open source?
What are some alternatives?
python-fpe - FPE - Format Preserving Encryption with FF3 in Python
Prefect - The easiest way to build, run, and monitor data pipelines at scale.
sayn - Data processing and modelling framework for automating tasks (incl. Python & SQL transformations).
Airflow - Apache Airflow - A platform to programmatically author, schedule, and monitor workflows
linkedin-visualizer - The missing feature in LinkedIn
Mage - 🧙 The modern replacement for Airflow. Mage is an open-source data pipeline tool for transforming and integrating data. https://github.com/mage-ai/mage-ai
Apache Superset - Apache Superset is a Data Visualization and Data Exploration Platform [Moved to: https://github.com/apache/superset]
airbyte - The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.
modin - Modin: Scale your Pandas workflows by changing a single line of code
MLflow - Open source platform for the machine learning lifecycle
versatile-data-kit - One framework to develop, deploy and operate data workflows with Python and SQL.
meltano