Python Dask

Open-source Python projects categorized as Dask

Top 23 Python Dask Projects

  1. Dask

    Parallel computing with task scheduling

    Project mention: Ask HN: What's the right tool for this job? | news.ycombinator.com | 2024-07-20

    From what I've seen, there are sort of two paths. I'll provide a well known example from each.

    1. lang specific distributed task library

    For example, in Python, celery is a pretty popular task system. If you (the dev) are the one doing all the code and running the workflows, it might work well for you. You build the core code and functions, and it handles the processing and resource stuff with a little config.

    * https://github.com/celery/celery

    Or lower level:

    * https://github.com/dask/dask

    2. DAG Workflow systems

    There are also whole systems for what you're describing. They've gotten especially popular in the ML ops and data engineering world. A common one is AirFlow:

    * https://github.com/apache/airflow

  2. InfluxDB

    InfluxDB – Built for High-Performance Time Series Workloads. InfluxDB 3 OSS is now GA. Transform, enrich, and act on time series data directly in the database. Automate critical tasks and eliminate the need to move data externally. Download now.

    InfluxDB logo
  3. stumpy

    STUMPY is a powerful and scalable Python library for modern time series analysis

    Project mention: Stumpy: Python library to computing matrix profiles on timeseries | news.ycombinator.com | 2025-03-19
  4. xarray

    N-D labeled arrays and datasets in Python

    Project mention: I Don't Like NumPy | news.ycombinator.com | 2025-05-15
  5. mars

    Mars is a tensor-based unified framework for large-scale data computation which scales numpy, pandas, scikit-learn and Python functions.

  6. swifter

    A package which efficiently applies any function to a pandas dataframe or series in the fastest available manner (by jmcarpenter2)

  7. fugue

    A unified interface for distributed computing. Fugue executes SQL, Python, Pandas, and Polars code on Spark, Dask and Ray without any rewrites.

  8. distributed

    A distributed task scheduler for Dask

  9. SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

    SaaSHub logo
  10. Optimus

    :truck: Agile Data Preparation Workflows made easy with Pandas, Dask, cuDF, Dask-cuDF, Vaex and PySpark (by ironmussa)

  11. Eliot

    Eliot: the logging system that tells you *why* it happened

  12. mlforecast

    Scalable machine 🤖 learning for time series forecasting.

  13. narwhals

    Lightweight and extensible compatibility layer between dataframe libraries!

    Project mention: Narwhals: Lightweight and extensible compatibility layer between dataframe libs | news.ycombinator.com | 2024-08-29
  14. pystore

    Fast data store for Pandas time-series data

  15. datacompy

    Pandas, Polars, Spark, and Snowpark DataFrame comparison for humans and more!

  16. dask-sql

    Distributed SQL Engine in Python using Dask

  17. nebari

    🪴 Nebari - your open source data science platform (by nebari-dev)

  18. stackstac

    Turn a STAC catalog into a dask-based xarray

  19. amazon-sagemaker-local-mode

    Amazon SageMaker Local Mode Examples

  20. aicsimageio

    Image Reading, Metadata Conversion, and Image Writing for Microscopy Images in Python

  21. xgboost_ray

    Distributed XGBoost on Ray

  22. dask-awkward

    Native Dask collection for awkward arrays, and the library to use it.

  23. bytehub

    ByteHub: making feature stores simple

  24. dask-memusage

    A low-impact profiler to figure out how much memory each task in Dask is using

  25. steam-data-engineering

    A data engineering project with Airflow, dbt, Terrafrom, GCP and much more!

  26. SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

    SaaSHub logo
NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020).

Python Dask discussion

Log in or Post with

Python Dask related posts

Index

What are some of the best open-source Dask projects in Python? This list will help you:

# Project Stars
1 Dask 13,186
2 stumpy 3,912
3 xarray 3,788
4 mars 2,718
5 swifter 2,585
6 fugue 2,081
7 distributed 1,627
8 Optimus 1,511
9 Eliot 1,144
10 mlforecast 1,018
11 narwhals 984
12 pystore 577
13 datacompy 566
14 dask-sql 404
15 nebari 294
16 stackstac 256
17 amazon-sagemaker-local-mode 256
18 aicsimageio 213
19 xgboost_ray 148
20 dask-awkward 64
21 bytehub 60
22 dask-memusage 24
23 steam-data-engineering 24

Sponsored
InfluxDB – Built for High-Performance Time Series Workloads
InfluxDB 3 OSS is now GA. Transform, enrich, and act on time series data directly in the database. Automate critical tasks and eliminate the need to move data externally. Download now.
www.influxdata.com