Python Dask

Open-source Python projects categorized as Dask

Top 23 Python Dask Projects

  • Dask

    Parallel computing with task scheduling

    Project mention: Ask HN: What's the right tool for this job? | news.ycombinator.com | 2024-07-20

    From what I've seen, there are sort of two paths. I'll provide a well known example from each.

    1. lang specific distributed task library

    For example, in Python, celery is a pretty popular task system. If you (the dev) are the one doing all the code and running the workflows, it might work well for you. You build the core code and functions, and it handles the processing and resource stuff with a little config.

    * https://github.com/celery/celery

    Or lower level:

    * https://github.com/dask/dask

    2. DAG Workflow systems

    There are also whole systems for what you're describing. They've gotten especially popular in the ML ops and data engineering world. A common one is AirFlow:

    * https://github.com/apache/airflow

  • Scout Monitoring

    Free Django app performance insights with Scout Monitoring. Get Scout setup in minutes, and let us sweat the small stuff. A couple lines in settings.py is all you need to start monitoring your apps. Sign up for our free tier today.

    Scout Monitoring logo
  • ibis

    the portable Python dataframe library

    Project mention: DuckDB 1.1.0 Released | news.ycombinator.com | 2024-09-09

    There is is still much to do especially on large table formats (iceberg/delta) and memory management when running on bigger boxes on cloud. Eg the elusive "Failed to allocate ..." bug[1] is an inhibitor to the claim that big data is dead[2]. As it is, we tried and abandoned DuckDB as a cheaper replacement for some databricks batch jobs.

    [0] https://github.com/ibis-project/ibis

  • stumpy

    STUMPY is a powerful and scalable Python library for modern time series analysis

    Project mention: Powerful and scalable Python library for modern time series analysis | news.ycombinator.com | 2024-08-01
  • xarray

    N-D labeled arrays and datasets in Python

  • mars

    Mars is a tensor-based unified framework for large-scale data computation which scales numpy, pandas, scikit-learn and Python functions.

  • swifter

    A package which efficiently applies any function to a pandas dataframe or series in the fastest available manner (by jmcarpenter2)

  • fugue

    A unified interface for distributed computing. Fugue executes SQL, Python, Pandas, and Polars code on Spark, Dask and Ray without any rewrites.

    Project mention: FLaNK Stack Weekly 22 January 2024 | dev.to | 2024-01-22
  • InfluxDB

    Purpose built for real-time analytics at any scale. InfluxDB Platform is powered by columnar analytics, optimized for cost-efficient storage, and built with open data standards.

    InfluxDB logo
  • distributed

    A distributed task scheduler for Dask

  • Optimus

    :truck: Agile Data Preparation Workflows made easy with Pandas, Dask, cuDF, Dask-cuDF, Vaex and PySpark (by ironmussa)

  • Eliot

    Eliot: the logging system that tells you *why* it happened

  • mlforecast

    Scalable machine 🤖 learning for time series forecasting.

  • pystore

    Fast data store for Pandas time-series data

  • datacompy

    Pandas, Polars, and Spark DataFrame comparison for humans and more!

  • dask-sql

    Distributed SQL Engine in Python using Dask

  • narwhals

    Lightweight and extensible compatibility layer between dataframe libraries!

    Project mention: Narwhals: Lightweight and extensible compatibility layer between dataframe libs | news.ycombinator.com | 2024-08-29
  • nebari

    🪴 Nebari - your open source data science platform (by nebari-dev)

  • amazon-sagemaker-local-mode

    Amazon SageMaker Local Mode Examples

    Project mention: Debugging Python Code in Amazon SageMaker Locally Using Visual Studio Code and PyCharm: A Step-by-Step Guide | dev.to | 2023-11-15

    git clone https://github.com/aws-samples/amazon-sagemaker-local-mode/ cd amazon-sagemaker-local-mode/general_pipeline_local_debug python3 -m venv .venv source .venv/bin/activate pip install jupyter jupyter lab

  • stackstac

    Turn a STAC catalog into a dask-based xarray

  • aicsimageio

    Image Reading, Metadata Conversion, and Image Writing for Microscopy Images in Python

  • xgboost_ray

    Distributed XGBoost on Ray

  • dask-awkward

    Native Dask collection for awkward arrays, and the library to use it.

  • bytehub

    ByteHub: making feature stores simple

  • dask-memusage

    A low-impact profiler to figure out how much memory each task in Dask is using

  • SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

    SaaSHub logo
NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020).

Python Dask discussion

Log in or Post with

Python Dask related posts

Index

What are some of the best open-source Dask projects in Python? This list will help you:

Project Stars
1 Dask 12,376
2 ibis 5,021
3 stumpy 3,599
4 xarray 3,554
5 mars 2,692
6 swifter 2,514
7 fugue 1,954
8 distributed 1,565
9 Optimus 1,472
10 Eliot 1,101
11 mlforecast 839
12 pystore 554
13 datacompy 459
14 dask-sql 383
15 narwhals 344
16 nebari 274
17 amazon-sagemaker-local-mode 242
18 stackstac 238
19 aicsimageio 201
20 xgboost_ray 137
21 dask-awkward 60
22 bytehub 58
23 dask-memusage 24

Sponsored
Free Django app performance insights with Scout Monitoring
Get Scout setup in minutes, and let us sweat the small stuff. A couple lines in settings.py is all you need to start monitoring your apps. Sign up for our free tier today.
www.scoutapm.com

Did you konow that Python is
the 1st most popular programming language
based on number of metions?