Benchmarking Pandas, CuDF, Modin, Apache Arrow and Spark on a Billion Taxi Rides dataset

Our great sponsors

WorkOS - The modern identity platform for B2B SaaS

InfluxDB - Power Real-Time Data Analytics at Scale

SaaSHub - Software Alternatives and Reviews

Our great sponsors

udsb

0 8 4.4 Jupyter Notebook

Unlimited Data-Science Benchmarks for Numeric, Tabular and Graph Workloads

If you are familiar with the API, feel free to contribute sources: repo. We can then rerun on the same hardware for the sake of completeness... But I have a feeling - such libraries will continue popping up.
db-benchmark

5 319 0.0 R

reproducible benchmark of database-like ops

And more benchmarks: https://h2oai.github.io/db-benchmark/. If you are looking for performant dataframes, ideomatic polars typically tops the benchmarks.
WorkOS

workos.com
sponsored

The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Why we dropped Docker for Python environments
1 project | /r/dataengineering | 12 Apr 2023
[D] Can we use Ray for distributed training on vertex ai ? Can someone provide me examples for the same ? Also which dataframe libraries you guys used for training machine learning models on huge datasets (100 gb+) (because pandas can't handle huge data).
1 project | /r/MachineLearning | 9 Feb 2023
Story of my life
1 project | /r/ProgrammerHumor | 28 Nov 2022
Artificial Intelligence in Python
1 project | /r/learnpython | 30 Oct 2022
Buka | Sains Data GPU RAPIDS
1 project | /r/opencv | 21 Feb 2022

Benchmarking Pandas, CuDF, Modin, Apache Arrow and Spark on a Billion Taxi Rides dataset

This page summarizes the projects mentioned and recommended in the original post on /r/Python
Numpy apache-arrow Arrow cublas Cudf
Post date: 21 Sep 2022

udsb

db-benchmark

WorkOS

Related posts

Benchmarking Pandas, CuDF, Modin, Apache Arrow and Spark on a Billion Taxi Rides dataset

This page summarizes the projects mentioned and recommended in the original post on /r/Python Numpy apache-arrow Arrow cublas Cudf Post date: 21 Sep 2022

udsb

db-benchmark

WorkOS

Related posts

This page summarizes the projects mentioned and recommended in the original post on /r/Python
Numpy apache-arrow Arrow cublas Cudf
Post date: 21 Sep 2022