SaaSHub helps you find the best software and product alternatives Learn more →
Top 9 Python pydata Projects
-
From what I've seen, there are sort of two paths. I'll provide a well known example from each.
1. lang specific distributed task library
For example, in Python, celery is a pretty popular task system. If you (the dev) are the one doing all the code and running the workflows, it might work well for you. You build the core code and functions, and it handles the processing and resource stuff with a little config.
* https://github.com/celery/celery
Or lower level:
* https://github.com/dask/dask
2. DAG Workflow systems
There are also whole systems for what you're describing. They've gotten especially popular in the ML ops and data engineering world. A common one is AirFlow:
* https://github.com/apache/airflow
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
-
Project mention: Powerful and scalable Python library for modern time series analysis | news.ycombinator.com | 2024-08-01
-
-
-
-
-
Github URL: https://github.com/bodo-ai/Bodo
-
pyvtreat
vtreat is a data frame processor/conditioner that prepares real-world data for predictive modeling in a statistically sound manner. Distributed under a BSD-3-Clause license.
-
Python pydata discussion
Python pydata related posts
-
Shuffling large data at constant memory in Dask
-
My new company uses Pyspark. I want to learn it before my starting date. Any advice?
-
Great forward progress on squashing cluster deadlocks
-
Is Numpy always more efficient than Pandas? And how much should we rely on Python anyway?
-
Ask HN: Is PySPark a Dead-End?
-
How to load 85.6 GB of XML data into a dataframe
-
How to load 85.6 GB of XML data into a dataframe
-
A note from our sponsor - SaaSHub
www.saashub.com | 19 Jan 2025
Index
What are some of the best open-source pydata projects in Python? This list will help you:
# | Project | Stars |
---|---|---|
1 | Dask | 12,857 |
2 | stumpy | 3,734 |
3 | koalas | 3,346 |
4 | pandas-datareader | 2,993 |
5 | distributed | 1,589 |
6 | pyjanitor | 1,385 |
7 | Bodo | 176 |
8 | pyvtreat | 122 |
9 | graphblas-algorithms | 79 |