|2 days ago||6 days ago|
|BSD 3-clause "New" or "Revised" License||Apache License 2.0|
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
Best Data Structure for this?
1 project | reddit.com/r/learnpython | 17 Jan 2022
If you really want to store it all (labels included) in one data structure, you should look up pandas.
SEC Speed is a myth.
1 project | reddit.com/r/CFB | 15 Jan 2022
Another question you may be asking is: "What about skill players?" Well, what about them? Skill players are defined as players that consistently tote the rock. I was able to filter out skill player's performance in different combine events using pandas. For our purposes, the following positions (as listed on PRF) were considered 'skill players': WR, RB, QB, TE, DB, LB. In included linebackers but if you want to not include them, knock yourself out. It kind of only helps my case that the likes of Roquan Smith and Nakobe Dean don't count for the SEC. When only considering skill players, the SEC ranks 2nd to the Big 12 in 40-yard dash times. In the other combine events for which there is data, the SEC ranks first in none of them.
Open source projects that are good to read to learn best practices?
2 projects | reddit.com/r/cscareerquestions | 14 Jan 2022
5 Useful Pandas Methods You May Not Know Existed (Part 2)
1 project | reddit.com/r/Python | 9 Jan 2022
You glossed over the fact that `.pct_change` isn't actually "percent change" as documented. More fun reading: https://github.com/pandas-dev/pandas/issues/20752
Career change - data analysis
1 project | reddit.com/r/AusFinance | 9 Jan 2022
I suggest pandas might be a great tool for you as you will be able to read write excel / csv files and process them and see how you get on.
Trading Algos - 5 Key Metrics and How to Implement Them in Python
4 projects | dev.to | 8 Jan 2022
Now to implement this one, we'll have to do some manipulation to our account values. Let's use the power of numpy to help us out here (oh and it's also the same in pandas too. We'll be using np.diff to take the returns of our account values and resampling them.
What does it mean to scale your python powered pipeline?
4 projects | dev.to | 3 Jan 2022
Increase code efficiency: Python is designed for ease of use and easy extension, but not performance. As a developer, the onus is on you to do more work so that the application executes less code. Whenever possible use vectorized library functions instead of loops. Python is successful in data science because of the pre-compiled code offered by data-appropriate libraries in the pydata stack such as pandas and numpy.
How do I combine two lists together to form a x y coordinate reference point?
3 projects | reddit.com/r/learnpython | 2 Jan 2022
Top 7 Dev Tools for AI Startups
4 projects | dev.to | 30 Dec 2021
Built on top of Python, pandas is an open source data analysis and manipulation tool, similar to NumPy. While it relies on NumPy arrays for much of its manipulation and computation, pandas makes it easier to visualize and explore data, helping our team make better sense of the large amounts of data we work with on a daily basis.
Appending Data to DataFrames
1 project | reddit.com/r/learnpython | 24 Dec 2021
Dataframes are not meant to be as flexible as lists in terms of extending the data they hold, dataframes are much more "deliberate". Ideally if you're trying to dynamically add data to a dataframe you should first collect all the data and then initialize the dataframe once. Or collect separate dataframes and concat them once. The pandas developers are even thinking about deprecating append (see here)
Air flow maximization.
1 project | reddit.com/r/AirflowJobs | 12 Jan 2022
LOL, not sure if your joking or not but ... this sub is for a software package called Airflow (https://airflow.apache.org/), not physical airflow.
Getting ahead of low/no code platforms as a developer
2 projects | reddit.com/r/cscareerquestions | 3 Jan 2022
Figuring out how to migrate our existing Airflow deployment and DAGs to something more easily deployable to k8s
Anyone tried Apache Airflow to automate?
2 projects | reddit.com/r/selfhosted | 29 Dec 2021
Anyone used Airflow (https://airflow.apache.org/) to automate all the things yet?
What are some ways to execute a function at a certain time of day?
3 projects | reddit.com/r/learnpython | 29 Dec 2021
Just to offer one more, Apache Airflow is another program that offers job scheduling functionality: https://airflow.apache.org/
Data pipelines with Luigi
4 projects | dev.to | 22 Dec 2021
Moreover, configure and deploy the Luigi's Scheduler on a server / pod for production use is easy, while it might be not for other similar tools like Apache AirFlow.
Migrating to Snowflake, Redshift, or BigQuery? Use Datafold to Avoid these Common Pitfalls
2 projects | dev.to | 15 Dec 2021
Be automated with tools like dbt or Apache Airflow or integrated into a continuous integration (CI) process.
Jinja2 not formatting my text correctly. Any advice?
11 projects | reddit.com/r/learnpython | 10 Dec 2021
ListItem(name='Apache Airflow', website='https://airflow.apache.org/', category='Workflow Engine', short_description='Apache Airflow is an open-source workflow management platform for data engineering pipelines.'),
Taking on the ML pipeline challenge: why data scientists need to own their ML workflows in production
4 projects | dev.to | 6 Dec 2021
So, if you even want to use MLFlow to track your experiments, run the pipeline on Airflow, and then deploy a model to a Neptune Model Registry, ZenML will facilitate this MLOps Stack for you. This decision can be made jointly by the data scientists and engineers. As ZenML is a framework, custom pieces of the puzzle can also be added here to accommodate legacy infrastructure.
Why ML should be written as pipelines from the get-go
3 projects | dev.to | 6 Dec 2021
ZenML is an exercise in finding the right layer of abstraction for ML. Here, we treat pipelines as first-class citizens. This means that data scientists are exposed to pipelines directly in the framework, but not in the same manner as the data pipelines from the ETL space (Prefect, Airflow et al.). Pipelines are treated as experiments — meaning they can be compared and analyzed directly. Only when it is time to flip over to productionalization, can they be converted to classical data pipelines.
How to Serve Massive Computations Using Python Web Apps.
1 project | dev.to | 23 Nov 2021
In this demo, we use the request itself as the trigger and begin computation immediately. But it may vary according to the nature of your application. Often, you might have to use a separate pipeline as well. In such scenarios, you may need technologies such as Apache Airflow or Prefect.
What are some alternatives?
Kedro - A Python framework for creating reproducible, maintainable and modular data science code.
Cubes - Light-weight Python OLAP framework for multi-dimensional data analysis
dagster - An orchestration platform for the development, production, and observation of data assets.
orange - 🍊 :bar_chart: :bulb: Orange: Interactive data analysis
luigi - Luigi is a Python module that helps you build complex pipelines of batch jobs. It handles dependency resolution, workflow management, visualization etc. It also comes with Hadoop support built in.
Dask - Parallel computing with task scheduling
NumPy - The fundamental package for scientific computing with Python.
SymPy - A computer algebra system written in pure Python
Apache Camel - Apache Camel is an open source integration framework that empowers you to quickly and easily integrate various systems consuming or producing data.
pyexcel - Single API for reading, manipulating and writing data in csv, ods, xls, xlsx and xlsm files
blaze - NumPy and Pandas interface to Big Data