|4 days ago||5 days ago|
|Apache License 2.0||Apache License 2.0|
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
How to Serve Massive Computations Using Python Web Apps.
1 project | dev.to | 23 Nov 2021
In this demo, we use the request itself as the trigger and begin computation immediately. But it may vary according to the nature of your application. Often, you might have to use a separate pipeline as well. In such scenarios, you may need technologies such as Apache Airflow or Prefect.
Apache Airflow In EKS Cluster
1 project | dev.to | 10 Nov 2021
Airflow is one of the most popular tools for running workflows espeically data-pipeline.
Distributed computing in python??
2 projects | reddit.com/r/learnpython | 9 Nov 2021
AWS MWAA and AWS SES integration
1 project | dev.to | 2 Nov 2021
This problem was already reported in a few Airflow issues and PRs. The fix didn't make the cut for Airflow 2.2 and will be probably there in version 2.3, but because we are talking about MWAA (version 2.0.2), we don't really know when this will be fixed on AWS.
Noobie who is trying to use K8s needs confirmation to know if this is the way or he is overestimating Kubernetes.
3 projects | reddit.com/r/kubernetes | 20 Oct 2021
The Data Engineer Roadmap 🗺
12 projects | dev.to | 19 Oct 2021
Anything Comparable to power automate or flow for Linux?
2 projects | reddit.com/r/sysadmin | 17 Oct 2021
I never used Power Automate, but it looks like a workflow orchestrator. So checkout https://airflow.apache.org/
Airflow with different conda environments
1 project | reddit.com/r/dataengineering | 13 Oct 2021
If Airflow is the way to go then try DockerOperators (https://github.com/apache/airflow/blob/main/airflow/providers/docker/example_dags/example_docker.py). It's not the easiest set up but will do what you from what I get from your question.
Databricks jobs and Airflow on Kubernetes
1 project | reddit.com/r/dataengineering | 2 Oct 2021
I have not used databricks but it is something we are looking into integrating into our infrastructure in the future. Since Databricks is a service that does not run locally, I would use the databricks Operators/Hooks that come with airflow, rather than trying to build out anything of my own. https://github.com/apache/airflow/blob/main/airflow/providers/databricks/hooks/databricks.py
what do you think about airflow?
2 projects | reddit.com/r/dataengineering | 2 Oct 2021
I think one of the main design problems I have with Airflow is the fact that it tends to tightly couple processing/transform code with data movement code which makes debugging tricky. The way I have solved this is by building a command line interface to all the processing code so I can debug the processing code outside of any airflow infrastructure (which can be painful to get running locally if one does not use Airflow Breeze.
What tools to use for small in house applications?
2 projects | reddit.com/r/AskProgramming | 21 May 2021
If you need to stick with spreadsheets I agree with the suggestion of Python--or Java, I used to automate a bunch of business processes with Apache Camel.
Im making a FME open source clone
4 projects | reddit.com/r/gis | 5 May 2021
Or if you are looking for something light-weighter you can always use Camel camel.apache.org
Steps to upgrade spring-boot 1.x to 2.x
2 projects | dev.to | 13 Apr 2021
apache camel dependency on spring boot, Kafka etc, also other libs dependencies Note: Kafka 1.1: https://mvnrepository.com/artifact/org.apache.camel/camel-kafka/2.22.4 kafka 2.0 : https://mvnrepository.com/artifact/org.apache.camel/camel-kafka/2.23.1 https://github.com/apache/camel/blob/master/components/camel-kafka/src/main/docs/kafka-component.adoc
What are some alternatives?
Kedro - A Python framework for creating reproducible, maintainable and modular data science code.
luigi - Luigi is a Python module that helps you build complex pipelines of batch jobs. It handles dependency resolution, workflow management, visualization etc. It also comes with Hadoop support built in.
dagster - An orchestration platform for the development, production, and observation of data assets.
Apache Kafka - Mirror of Apache Kafka
Dask - Parallel computing with task scheduling
Apache Pulsar - Apache Pulsar - distributed pub-sub messaging system
Pandas - Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, statistical functions, and much more
Numba - NumPy aware dynamic Python compiler using LLVM
Embedded RabbitMQ - A JVM library to use RabbitMQ as an embedded service
Apache ActiveMQ Artemis - Mirror of Apache ActiveMQ Artemis