|4 days ago||2 days ago|
|Apache License 2.0||GNU General Public License v3.0 or later|
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
How to Serve Massive Computations Using Python Web Apps.
1 project | dev.to | 23 Nov 2021
In this demo, we use the request itself as the trigger and begin computation immediately. But it may vary according to the nature of your application. Often, you might have to use a separate pipeline as well. In such scenarios, you may need technologies such as Apache Airflow or Prefect.
Apache Airflow In EKS Cluster
1 project | dev.to | 10 Nov 2021
Airflow is one of the most popular tools for running workflows espeically data-pipeline.
Distributed computing in python??
2 projects | reddit.com/r/learnpython | 9 Nov 2021
AWS MWAA and AWS SES integration
1 project | dev.to | 2 Nov 2021
This problem was already reported in a few Airflow issues and PRs. The fix didn't make the cut for Airflow 2.2 and will be probably there in version 2.3, but because we are talking about MWAA (version 2.0.2), we don't really know when this will be fixed on AWS.
Noobie who is trying to use K8s needs confirmation to know if this is the way or he is overestimating Kubernetes.
3 projects | reddit.com/r/kubernetes | 20 Oct 2021
The Data Engineer Roadmap 🗺
12 projects | dev.to | 19 Oct 2021
Anything Comparable to power automate or flow for Linux?
2 projects | reddit.com/r/sysadmin | 17 Oct 2021
I never used Power Automate, but it looks like a workflow orchestrator. So checkout https://airflow.apache.org/
Airflow with different conda environments
1 project | reddit.com/r/dataengineering | 13 Oct 2021
If Airflow is the way to go then try DockerOperators (https://github.com/apache/airflow/blob/main/airflow/providers/docker/example_dags/example_docker.py). It's not the easiest set up but will do what you from what I get from your question.
Databricks jobs and Airflow on Kubernetes
1 project | reddit.com/r/dataengineering | 2 Oct 2021
I have not used databricks but it is something we are looking into integrating into our infrastructure in the future. Since Databricks is a service that does not run locally, I would use the databricks Operators/Hooks that come with airflow, rather than trying to build out anything of my own. https://github.com/apache/airflow/blob/main/airflow/providers/databricks/hooks/databricks.py
what do you think about airflow?
2 projects | reddit.com/r/dataengineering | 2 Oct 2021
I think one of the main design problems I have with Airflow is the fact that it tends to tightly couple processing/transform code with data movement code which makes debugging tricky. The way I have solved this is by building a command line interface to all the processing code so I can debug the processing code outside of any airflow infrastructure (which can be painful to get running locally if one does not use Airflow Breeze.
ETL Library for Python
1 project | reddit.com/r/Python | 27 Sep 2021
"On the simpler side". Do you mean with a graphical interface? Then, orange would be a nice solution. https://orangedatamining.com/
[D] Why Hasn't FOSS Drag-and-Drop ML tools taken off yet?
2 projects | reddit.com/r/MachineLearning | 8 Sep 2021
Currently, I am looking around for modules for Knime and Orange and looked at some of the modules, and realized that it does not have enough tools within their tool kit (e.g. text data analysis, network analysis, image classification).
Orange: Open-source component-based machine learning and data visualization
1 project | news.ycombinator.com | 23 Jun 2021
No-code vs Visual Programming
1 project | reddit.com/r/nocode | 12 Mar 2021
I am using visual programming tools that overlap with the no-code concept such as: KNIME and Orange. To visualize the results, I use connectors with platforms like DataStudio or Google AppSheet.
Informatica per la SCIENZA, per un ignorante in materia.
1 project | reddit.com/r/ItalyInformatica | 28 Feb 2021
What are some alternatives?
Kedro - A Python framework for creating reproducible, maintainable and modular data science code.
Pandas - Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, statistical functions, and much more
luigi - Luigi is a Python module that helps you build complex pipelines of batch jobs. It handles dependency resolution, workflow management, visualization etc. It also comes with Hadoop support built in.
dagster - An orchestration platform for the development, production, and observation of data assets.
Dask - Parallel computing with task scheduling
glue - Linked Data Visualizations Across Multiple Files
Apache Camel - Apache Camel is an open source integration framework that empowers you to quickly and easily integrate various systems consuming or producing data.
Numba - NumPy aware dynamic Python compiler using LLVM
RDKit - The official sources for the RDKit library
n8n - Free and open fair-code licensed node based Workflow Automation Tool. Easily automate tasks across different services.