Python Dask

Open-source Python projects categorized as Dask

Top 23 Python Dask Projects

  • Dask

    Parallel computing with task scheduling

    Project mention: The Distributed Tensor Algebra Compiler (2022) | news.ycombinator.com | 2023-06-15
  • xarray

    N-D labeled arrays and datasets in Python

    Project mention: Request for Startups: Climate Tech | news.ycombinator.com | 2022-12-15

    PyTorch and JAX are used heavily in climate science on the ML side. For more general analytics, not so much. Many of our users like to use Xarray as a high-level API. There has been some work to integrate Xarray with PyTorch (https://github.com/pydata/xarray/issues/3232) but we're not there yet.

    The Python Array API standard should help align these different back-ends: https://data-apis.org/array-api/latest/

  • Mergify

    Updating dependencies is time-consuming.. Solutions like Dependabot or Renovate update but don't merge dependencies. You need to do it manually while it could be fully automated! Add a Merge Queue to your workflow and stop caring about PR management & merging. Try Mergify for free.

  • ibis

    The flexibility of Python with the scale and performance of modern SQL.

    Project mention: A LLM+OLAP Solution | news.ycombinator.com | 2023-09-11

    Ibis could also be a target. It compiles queries written in python to multiple dataframe libraries, and SQL targets.

    https://ibis-project.org/

  • stumpy

    STUMPY is a powerful and scalable Python library for modern time series analysis

  • mars

    Mars is a tensor-based unified framework for large-scale data computation which scales numpy, pandas, scikit-learn and Python functions.

  • swifter

    A package which efficiently applies any function to a pandas dataframe or series in the fastest available manner (by jmcarpenter2)

  • fugue

    A unified interface for distributed computing. Fugue executes SQL, Python, Pandas, and Polars code on Spark, Dask and Ray without any rewrites.

    Project mention: Daft: A High-Performance Distributed Dataframe Library for Multimodal Data | news.ycombinator.com | 2023-06-07

    Please integrate it with Fugue.

    https://github.com/fugue-project/fugue

  • Sonar

    Write Clean Python Code. Always.. Sonar helps you commit clean code every time. With over 225 unique rules to find Python bugs, code smells & vulnerabilities, Sonar finds the issues while you focus on the work.

  • distributed

    A distributed task scheduler for Dask

    Project mention: Shuffling large data at constant memory in Dask | /r/Python | 2023-04-17

    Thanks, if you give it a try, you can share your experience in this GitHub issue, where developers are collecting info for further improvements. https://github.com/dask/distributed/discussions/7509

  • Optimus

    :truck: Agile Data Preparation Workflows made easy with Pandas, Dask, cuDF, Dask-cuDF, Vaex and PySpark (by ironmussa)

  • Eliot

    Eliot: the logging system that tells you *why* it happened

    Project mention: Logging code mess | /r/Python | 2023-04-14

    Maybe something like eliot could work for you

  • mlforecast

    Scalable machine 🤖 learning for time series forecasting.

    Project mention: Sales forecast for next two years | /r/datascience | 2023-06-25

    MLForecast

  • pystore

    Fast data store for Pandas time-series data

  • dask-sql

    Distributed SQL Engine in Python using Dask

    Project mention: FLaNK Stack Weekly for 20 June 2023 | dev.to | 2023-06-20
  • nebari

    🪴 Nebari - your open source data science platform (by nebari-dev)

    Project mention: I re-implemented JupyterHub the Kubernetes way | /r/Python | 2023-04-05

    Have you seen Nebari?

  • stackstac

    Turn a STAC catalog into a dask-based xarray

    Project mention: Can you replace Geoserver with COG and MVT from a bucket? | /r/geospatial | 2023-03-12

    Like they're doing here to access sentinel 2 images https://github.com/gjoseph92/stackstac

  • aicsimageio

    Image Reading, Metadata Conversion, and Image Writing for Microscopy Images in Python

  • xgboost_ray

    Distributed XGBoost on Ray

  • bytehub

    ByteHub: making feature stores simple

  • dask-awkward

    Native Dask collection for awkward arrays, and the library to use it.

  • dask-memusage

    A low-impact profiler to figure out how much memory each task in Dask is using

  • pangeo-binder

    Pangeo + Binder (dev repo for a binder/pangeo fusion concept)

  • steam-data-engineering

    A data engineering project with Airflow, dbt, Terrafrom, GCP and much more!

    Project mention: Feedback for my project about Steam games data, featuring Terraform, Airflow, dbt, spark, dataproc, Bigquery, S3, etc | /r/dataengineering | 2022-09-30

    Here is the GH repo: https://github.com/VicenteYago/steam-data-engineering with more detailed info.

  • pythonic

    Examples of the Python programming language (by wigging)

  • InfluxDB

    Collect and Analyze Billions of Data Points in Real Time. Manage all types of time series data in a single, purpose-built database. Run at any scale in any environment in the cloud, on-premises, or at the edge.

NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020). The latest post mention was on 2023-09-11.

Python Dask related posts

Index

What are some of the best open-source Dask projects in Python? This list will help you:

Project Stars
1 Dask 11,398
2 xarray 3,155
3 ibis 3,110
4 stumpy 2,781
5 mars 2,642
6 swifter 2,343
7 fugue 1,723
8 distributed 1,489
9 Optimus 1,406
10 Eliot 1,046
11 mlforecast 511
12 pystore 497
13 dask-sql 326
14 nebari 226
15 stackstac 195
16 aicsimageio 168
17 xgboost_ray 116
18 bytehub 56
19 dask-awkward 48
20 dask-memusage 24
21 pangeo-binder 18
22 steam-data-engineering 15
23 pythonic 9
Collect and Analyze Billions of Data Points in Real Time
Manage all types of time series data in a single, purpose-built database. Run at any scale in any environment in the cloud, on-premises, or at the edge.
www.influxdata.com