SaaSHub helps you find the best software and product alternatives Learn more →
Top 23 Python Dask Projects
-
From what I've seen, there are sort of two paths. I'll provide a well known example from each.
1. lang specific distributed task library
For example, in Python, celery is a pretty popular task system. If you (the dev) are the one doing all the code and running the workflows, it might work well for you. You build the core code and functions, and it handles the processing and resource stuff with a little config.
* https://github.com/celery/celery
Or lower level:
* https://github.com/dask/dask
2. DAG Workflow systems
There are also whole systems for what you're describing. They've gotten especially popular in the ML ops and data engineering world. A common one is AirFlow:
* https://github.com/apache/airflow
-
Scout Monitoring
Free Django app performance insights with Scout Monitoring. Get Scout setup in minutes, and let us sweat the small stuff. A couple lines in settings.py is all you need to start monitoring your apps. Sign up for our free tier today.
-
There is is still much to do especially on large table formats (iceberg/delta) and memory management when running on bigger boxes on cloud. Eg the elusive "Failed to allocate ..." bug[1] is an inhibitor to the claim that big data is dead[2]. As it is, we tried and abandoned DuckDB as a cheaper replacement for some databricks batch jobs.
[0] https://github.com/ibis-project/ibis
-
Project mention: Powerful and scalable Python library for modern time series analysis | news.ycombinator.com | 2024-08-01
-
-
mars
Mars is a tensor-based unified framework for large-scale data computation which scales numpy, pandas, scikit-learn and Python functions.
-
swifter
A package which efficiently applies any function to a pandas dataframe or series in the fastest available manner (by jmcarpenter2)
-
fugue
A unified interface for distributed computing. Fugue executes SQL, Python, Pandas, and Polars code on Spark, Dask and Ray without any rewrites.
-
InfluxDB
Purpose built for real-time analytics at any scale. InfluxDB Platform is powered by columnar analytics, optimized for cost-efficient storage, and built with open data standards.
-
-
Optimus
:truck: Agile Data Preparation Workflows made easy with Pandas, Dask, cuDF, Dask-cuDF, Vaex and PySpark (by ironmussa)
-
-
-
-
-
-
Project mention: Narwhals: Lightweight and extensible compatibility layer between dataframe libs | news.ycombinator.com | 2024-08-29
-
-
Project mention: Debugging Python Code in Amazon SageMaker Locally Using Visual Studio Code and PyCharm: A Step-by-Step Guide | dev.to | 2023-11-15
git clone https://github.com/aws-samples/amazon-sagemaker-local-mode/ cd amazon-sagemaker-local-mode/general_pipeline_local_debug python3 -m venv .venv source .venv/bin/activate pip install jupyter jupyter lab
-
-
-
-
-
-
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
Python Dask discussion
Python Dask related posts
-
Farewell Pandas, and thanks for all the fish
-
Powerful and scalable Python library for modern time series analysis
-
TDAmeritrade: Timeseries Analysis with Stumpy
-
Stumpy: Matrix profile time series analysis
-
Shuffling large data at constant memory in Dask
-
Fugue: A unified interface for distributed computing
-
[Discussion] Open Source beats Google's AutoML for Time series
-
A note from our sponsor - SaaSHub
www.saashub.com | 9 Sep 2024
Index
What are some of the best open-source Dask projects in Python? This list will help you:
Project | Stars | |
---|---|---|
1 | Dask | 12,376 |
2 | ibis | 5,021 |
3 | stumpy | 3,599 |
4 | xarray | 3,554 |
5 | mars | 2,692 |
6 | swifter | 2,514 |
7 | fugue | 1,954 |
8 | distributed | 1,565 |
9 | Optimus | 1,472 |
10 | Eliot | 1,101 |
11 | mlforecast | 839 |
12 | pystore | 554 |
13 | datacompy | 459 |
14 | dask-sql | 383 |
15 | narwhals | 344 |
16 | nebari | 274 |
17 | amazon-sagemaker-local-mode | 242 |
18 | stackstac | 238 |
19 | aicsimageio | 201 |
20 | xgboost_ray | 137 |
21 | dask-awkward | 60 |
22 | bytehub | 58 |
23 | dask-memusage | 24 |