|3 days ago||4 days ago|
|Apache License 2.0||MIT License|
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
How to Serve Massive Computations Using Python Web Apps.
1 project | dev.to | 23 Nov 2021
In this demo, we use the request itself as the trigger and begin computation immediately. But it may vary according to the nature of your application. Often, you might have to use a separate pipeline as well. In such scenarios, you may need technologies such as Apache Airflow or Prefect.
Apache Airflow In EKS Cluster
1 project | dev.to | 10 Nov 2021
Airflow is one of the most popular tools for running workflows espeically data-pipeline.
Distributed computing in python??
2 projects | reddit.com/r/learnpython | 9 Nov 2021
AWS MWAA and AWS SES integration
1 project | dev.to | 2 Nov 2021
This problem was already reported in a few Airflow issues and PRs. The fix didn't make the cut for Airflow 2.2 and will be probably there in version 2.3, but because we are talking about MWAA (version 2.0.2), we don't really know when this will be fixed on AWS.
Noobie who is trying to use K8s needs confirmation to know if this is the way or he is overestimating Kubernetes.
3 projects | reddit.com/r/kubernetes | 20 Oct 2021
The Data Engineer Roadmap 🗺
12 projects | dev.to | 19 Oct 2021
Anything Comparable to power automate or flow for Linux?
2 projects | reddit.com/r/sysadmin | 17 Oct 2021
I never used Power Automate, but it looks like a workflow orchestrator. So checkout https://airflow.apache.org/
Airflow with different conda environments
1 project | reddit.com/r/dataengineering | 13 Oct 2021
If Airflow is the way to go then try DockerOperators (https://github.com/apache/airflow/blob/main/airflow/providers/docker/example_dags/example_docker.py). It's not the easiest set up but will do what you from what I get from your question.
Databricks jobs and Airflow on Kubernetes
1 project | reddit.com/r/dataengineering | 2 Oct 2021
I have not used databricks but it is something we are looking into integrating into our infrastructure in the future. Since Databricks is a service that does not run locally, I would use the databricks Operators/Hooks that come with airflow, rather than trying to build out anything of my own. https://github.com/apache/airflow/blob/main/airflow/providers/databricks/hooks/databricks.py
what do you think about airflow?
2 projects | reddit.com/r/dataengineering | 2 Oct 2021
I think one of the main design problems I have with Airflow is the fact that it tends to tightly couple processing/transform code with data movement code which makes debugging tricky. The way I have solved this is by building a command line interface to all the processing code so I can debug the processing code outside of any airflow infrastructure (which can be painful to get running locally if one does not use Airflow Breeze.
JetBrains DataSpell: The IDE for Data Scientists 1.0 Release
2 projects | reddit.com/r/Python | 2 Dec 2021
You should store only clean notebooks in git, not ones with the result of the calculation. You may use pre-commit hooks for this https://github.com/roy-ht/pre-commit-jupyter, https://pre-commit.com/
Give examples of really cool software made by a single developer?
14 projects | reddit.com/r/learnprogramming | 28 Nov 2021
Not exactly as high-stakes as some of the others in here, but Anthony Sottile wrote pre-commit and pre-commit.ci by himself and it’s one of the most useful tools in my arsenal because it triggers any number of code checks and stuff like that every time you do a big commit. I’ve got a number of his hooks installed as well as a linter, formatter, and type checker for my code, and it basically makes sure my code is of much higher quality.
I think I just had my first "coder moment" in Python. My program wasn't working, so I look, and look and look. Until finally....
2 projects | reddit.com/r/Python | 26 Nov 2021
I've found the magic that is pre-commit. You pip install pre-commit, create a file called .pre-commit-config.yaml, put some repos in there (examples on the site).
What are some Django SECURITY Dos and Don't you often see?
1 project | reddit.com/r/django | 12 Nov 2021
I'd also recommend using pre-commit checks as you develop, specifically the hook for bandit, which statically analyzes your code for common security problems. If you trip a flag when bandit is checking your work, time to slow down and take a look.
Git hooks for Conventional Commits system wide ?
3 projects | reddit.com/r/git | 12 Nov 2021
I would recommend you configure pre-commit framework for your repositories. It's not system-wide, but is a configuration written to the repo itself, so you can be sure that same config should exist on all clones. And there is already a hook for commitizen (commitlint may have one, haven't checked).
10 Ways To Level Up Your Testing with Python
2 projects | dev.to | 10 Nov 2021
At ZenML we use pre-commit hooks that kick into action whenever you try to commit code. (Check out our pyproject.toml configuration and our scripts/ directory to see how we handle this!) It ensures a level of consistency throughout our codebase, ensuring that all our functions have docstrings, for example, or implementing a standard order for import statements.
2 Static Analysis Tools to Enhance Your Productivity
5 projects | dev.to | 5 Nov 2021
If you don't want to manually run Black and Flake8 before committing your changes, you can automate it with pre-commit.
Lab 7 Use Static Analysis tooling to manage project
1 project | dev.to | 5 Nov 2021
Git hook scripts are useful for identifying simple issues before submission to code review.
How can a small open-source project be improved?
3 projects | reddit.com/r/Python | 30 Oct 2021
2 projects | reddit.com/r/devops | 27 Oct 2021
A pre-commit hook is even better, but you keep that in your back pocket when the problems don't stop.
What are some alternatives?
Kedro - A Python framework for creating reproducible, maintainable and modular data science code.
luigi - Luigi is a Python module that helps you build complex pipelines of batch jobs. It handles dependency resolution, workflow management, visualization etc. It also comes with Hadoop support built in.
dagster - An orchestration platform for the development, production, and observation of data assets.
Dask - Parallel computing with task scheduling
Pandas - Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, statistical functions, and much more
Apache Camel - Apache Camel is an open source integration framework that empowers you to quickly and easily integrate various systems consuming or producing data.
Numba - NumPy aware dynamic Python compiler using LLVM
markdownlint-cli - MarkdownLint Command Line Interface
n8n - Free and open fair-code licensed node based Workflow Automation Tool. Easily automate tasks across different services.
Poetry - Python dependency management and packaging made easy.