Sonar helps you commit clean code every time. With over 225 unique rules to find Python bugs, code smells & vulnerabilities, Sonar finds the issues while you focus on the work. Learn more →
Top 23 Python Dask Projects
Parallel computing with task schedulingProject mention: The Distributed Tensor Algebra Compiler (2022) | news.ycombinator.com | 2023-06-15
N-D labeled arrays and datasets in PythonProject mention: Request for Startups: Climate Tech | news.ycombinator.com | 2022-12-15
PyTorch and JAX are used heavily in climate science on the ML side. For more general analytics, not so much. Many of our users like to use Xarray as a high-level API. There has been some work to integrate Xarray with PyTorch (https://github.com/pydata/xarray/issues/3232) but we're not there yet.
The Python Array API standard should help align these different back-ends: https://data-apis.org/array-api/latest/
Updating dependencies is time-consuming.. Solutions like Dependabot or Renovate update but don't merge dependencies. You need to do it manually while it could be fully automated! Add a Merge Queue to your workflow and stop caring about PR management & merging. Try Mergify for free.
The flexibility of Python with the scale and performance of modern SQL.
STUMPY is a powerful and scalable Python library for modern time series analysis
Mars is a tensor-based unified framework for large-scale data computation which scales numpy, pandas, scikit-learn and Python functions.
A package which efficiently applies any function to a pandas dataframe or series in the fastest available manner (by jmcarpenter2)
A unified interface for distributed computing. Fugue executes SQL, Python, Pandas, and Polars code on Spark, Dask and Ray without any rewrites.
Write Clean Python Code. Always.. Sonar helps you commit clean code every time. With over 225 unique rules to find Python bugs, code smells & vulnerabilities, Sonar finds the issues while you focus on the work.
A distributed task scheduler for DaskProject mention: Shuffling large data at constant memory in Dask | /r/Python | 2023-04-17
Thanks, if you give it a try, you can share your experience in this GitHub issue, where developers are collecting info for further improvements. https://github.com/dask/distributed/discussions/7509
:truck: Agile Data Preparation Workflows made easy with Pandas, Dask, cuDF, Dask-cuDF, Vaex and PySpark (by ironmussa)
Eliot: the logging system that tells you *why* it happenedProject mention: Logging code mess | /r/Python | 2023-04-14
Maybe something like eliot could work for you
Scalable machine 🤖 learning for time series forecasting.Project mention: Sales forecast for next two years | /r/datascience | 2023-06-25
Fast data store for Pandas time-series data
Distributed SQL Engine in Python using DaskProject mention: FLaNK Stack Weekly for 20 June 2023 | dev.to | 2023-06-20
🪴 Nebari - your open source data science platform (by nebari-dev)Project mention: I re-implemented JupyterHub the Kubernetes way | /r/Python | 2023-04-05
Have you seen Nebari?
Turn a STAC catalog into a dask-based xarrayProject mention: Can you replace Geoserver with COG and MVT from a bucket? | /r/geospatial | 2023-03-12
Like they're doing here to access sentinel 2 images https://github.com/gjoseph92/stackstac
Image Reading, Metadata Conversion, and Image Writing for Microscopy Images in Python
Distributed XGBoost on Ray
ByteHub: making feature stores simple
Native Dask collection for awkward arrays, and the library to use it.
A low-impact profiler to figure out how much memory each task in Dask is using
Pangeo + Binder (dev repo for a binder/pangeo fusion concept)
A data engineering project with Airflow, dbt, Terrafrom, GCP and much more!Project mention: Feedback for my project about Steam games data, featuring Terraform, Airflow, dbt, spark, dataproc, Bigquery, S3, etc | /r/dataengineering | 2022-09-30
Here is the GH repo: https://github.com/VicenteYago/steam-data-engineering with more detailed info.
Examples of the Python programming language (by wigging)
Collect and Analyze Billions of Data Points in Real Time. Manage all types of time series data in a single, purpose-built database. Run at any scale in any environment in the cloud, on-premises, or at the edge.
Python Dask related posts
Shuffling large data at constant memory in Dask
1 project | /r/Python | 17 Apr 2023
Fugue: A unified interface for distributed computing
1 project | news.ycombinator.com | 26 Mar 2023
[Discussion] Open Source beats Google's AutoML for Time series
1 project | /r/MachineLearning | 28 Feb 2023
File format for large data with many columns
2 projects | /r/Python | 15 May 2022
Time Series Analysis for air pollution data not aligned [R] [P]
1 project | /r/MachineLearning | 23 Apr 2022
What is the best way to save a csv.file in number only ? PC hangs when my file is more than 2GB
2 projects | /r/learnpython | 4 Apr 2022
[D] STUMPY v1.11.0 Released for Modern Time Series Analysis
2 projects | /r/MachineLearning | 22 Mar 2022
A note from our sponsor - Sonar
www.sonarsource.com | 26 Sep 2023
What are some of the best open-source Dask projects in Python? This list will help you: