|4 days ago||about 3 hours ago|
|Apache License 2.0||BSD 3-clause "New" or "Revised" License|
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
How to Serve Massive Computations Using Python Web Apps.
1 project | dev.to | 23 Nov 2021
In this demo, we use the request itself as the trigger and begin computation immediately. But it may vary according to the nature of your application. Often, you might have to use a separate pipeline as well. In such scenarios, you may need technologies such as Apache Airflow or Prefect.
Apache Airflow In EKS Cluster
1 project | dev.to | 10 Nov 2021
Airflow is one of the most popular tools for running workflows espeically data-pipeline.
Distributed computing in python??
2 projects | reddit.com/r/learnpython | 9 Nov 2021
AWS MWAA and AWS SES integration
1 project | dev.to | 2 Nov 2021
This problem was already reported in a few Airflow issues and PRs. The fix didn't make the cut for Airflow 2.2 and will be probably there in version 2.3, but because we are talking about MWAA (version 2.0.2), we don't really know when this will be fixed on AWS.
Noobie who is trying to use K8s needs confirmation to know if this is the way or he is overestimating Kubernetes.
3 projects | reddit.com/r/kubernetes | 20 Oct 2021
The Data Engineer Roadmap 🗺
12 projects | dev.to | 19 Oct 2021
Anything Comparable to power automate or flow for Linux?
2 projects | reddit.com/r/sysadmin | 17 Oct 2021
I never used Power Automate, but it looks like a workflow orchestrator. So checkout https://airflow.apache.org/
Airflow with different conda environments
1 project | reddit.com/r/dataengineering | 13 Oct 2021
If Airflow is the way to go then try DockerOperators (https://github.com/apache/airflow/blob/main/airflow/providers/docker/example_dags/example_docker.py). It's not the easiest set up but will do what you from what I get from your question.
Databricks jobs and Airflow on Kubernetes
1 project | reddit.com/r/dataengineering | 2 Oct 2021
I have not used databricks but it is something we are looking into integrating into our infrastructure in the future. Since Databricks is a service that does not run locally, I would use the databricks Operators/Hooks that come with airflow, rather than trying to build out anything of my own. https://github.com/apache/airflow/blob/main/airflow/providers/databricks/hooks/databricks.py
what do you think about airflow?
2 projects | reddit.com/r/dataengineering | 2 Oct 2021
I think one of the main design problems I have with Airflow is the fact that it tends to tightly couple processing/transform code with data movement code which makes debugging tricky. The way I have solved this is by building a command line interface to all the processing code so I can debug the processing code outside of any airflow infrastructure (which can be painful to get running locally if one does not use Airflow Breeze.
Why df.drop_duplicates() doesn't work for me?
2 projects | reddit.com/r/learnpython | 2 Dec 2021
The primary reason is that it actually doesn't mutate a dataframe, contrary to popular opinion. It makes a copy and then re-assigns the pointer according to one of the contributors to the pandas repo.2 projects | reddit.com/r/learnpython | 2 Dec 2021
There is discussion of the inplace argument becoming deprecated across the entire pandas API.
Trying to create a loop and print the values on a table. (Beginner)
1 project | reddit.com/r/learnpython | 2 Dec 2021
OK, so you should be using the right tool for this kind of analysis, which is definitely Pandas. I'm far from an expert in that library unfortunately but it should definitely be able to do the bucketing for you as well as the frequency analysis.
How to automate financial data collection and storage in CrateDB with Python and pandas
1 project | dev.to | 25 Nov 2021
Pandas is a famous package in Python, often used for Data Science. It shortens the process of handling data, has complete yet straightforward data representation forms, and makes tasks like filtering data easy.
It annoys me how people blame students for majoring in the wrong majors
1 project | reddit.com/r/lostgeneration | 22 Nov 2021
Should I do a CompSci course or just keep practicing my Python?
1 project | reddit.com/r/learnpython | 21 Nov 2021
Okay, if you don't need persistent storage, it will.be MUCH easier to use pandas to access the dataset you need. I suggest getting familiar with it, just do it for practice here. Here's a guide
[Pandas] Struggling to see what these lines achieve, any help appreciated.
1 project | reddit.com/r/Cython | 18 Nov 2021
It is a lot older, if you trace the git blame it was introduced first in this commit and apparently came from scikits.timeseries. I've yet to go look in that package to see.
New to pandas trying to figure out datasets and best place to learn?
1 project | reddit.com/r/learnpython | 11 Nov 2021
I installed pandas using this site: https://pandas.pydata.org/.
Learning Python on the Job
2 projects | dev.to | 11 Nov 2021
A fast and easy to use customer website feedback analytics toolkit and workflow using pandas, NumPy and sqlite that replaced a gigantic excel workbook that crashed if you looked at it funny. (another thing I picked up on the job was SQL, which was a snap with python).
Analyzing Kenya Power Planned Interruption Data
3 projects | dev.to | 9 Nov 2021
Cleaning, manipulating and analysing the extracted data using Pandas.
What are some alternatives?
Kedro - A Python framework for creating reproducible, maintainable and modular data science code.
Cubes - Light-weight Python OLAP framework for multi-dimensional data analysis
orange - 🍊 :bar_chart: :bulb: Orange: Interactive data analysis
luigi - Luigi is a Python module that helps you build complex pipelines of batch jobs. It handles dependency resolution, workflow management, visualization etc. It also comes with Hadoop support built in.
dagster - An orchestration platform for the development, production, and observation of data assets.
Dask - Parallel computing with task scheduling
NumPy - The fundamental package for scientific computing with Python.
SymPy - A computer algebra system written in pure Python
blaze - NumPy and Pandas interface to Big Data
pyexcel - Single API for reading, manipulating and writing data in csv, ods, xls, xlsx and xlsm files
Apache Camel - Apache Camel is an open source integration framework that empowers you to quickly and easily integrate various systems consuming or producing data.