legate.pandas
pathml
legate.pandas | pathml | |
---|---|---|
1 | 2 | |
72 | 364 | |
- | 3.0% | |
0.0 | 8.0 | |
over 2 years ago | about 1 month ago | |
C++ | Python | |
Apache License 2.0 | GNU General Public License v3.0 only |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
legate.pandas
-
Dask – a flexible library for parallel computing in Python
I see they also have have pandas replacement: https://github.com/nv-legate/legate.pandas. How is it different from cuDF?
pathml
- Hilo Semanal de Consultas IT - Asesoría Técnica, Desarrollo Profesional y Aprendizaje
-
Dask – a flexible library for parallel computing in Python
We have been using dask to support our computational pathology workflows [1], where the images are so big that they cannot be loaded in memory, let alone analyzed (standard pathology whole slide images are ~1GB; some microscopy techniques generate images >1TB). We divide each image into a bunch of smaller tiles and process each tile independently. The dask.distributed scheduler lets us scale up by distributing the tile processing across a cluster.
Benefits of dask.distributed: easy to get up and running, and has support for spinning up clusters on lots of different computing platforms (local machines, HPC cluster, k8s, etc.)
One difficulty is optimizing performance - there are so many configuration details (job size, number of workers, worker resources, etc. etc.) that it's been hard to know what is best.
[1] https://github.com/Dana-Farber-AIOS/pathml
What are some alternatives?
cunumeric - An Aspiring Drop-In Replacement for NumPy at Scale
mpire - A Python package for easy multiprocessing, but faster than multiprocessing
slideflow - Deep learning library for digital pathology, with both Tensorflow and PyTorch support.
cudf - cuDF - GPU DataFrame Library
Keras - Deep Learning for humans
distributed - A distributed task scheduler for Dask
pytorch-ssim - pytorch structural similarity (SSIM) loss
DataProfiler - What's in your data? Extract schema, statistics and entities from datasets
Dask - Parallel computing with task scheduling