Benchmarking Pandas, CuDF, Modin, Apache Arrow and Spark on a Billion Taxi Rides dataset

This page summarizes the projects mentioned and recommended in the original post on /r/Python

Our great sponsors
  • WorkOS - The modern identity platform for B2B SaaS
  • InfluxDB - Power Real-Time Data Analytics at Scale
  • SaaSHub - Software Alternatives and Reviews
  • udsb

    Unlimited Data-Science Benchmarks for Numeric, Tabular and Graph Workloads

    If you are familiar with the API, feel free to contribute sources: repo. We can then rerun on the same hardware for the sake of completeness... But I have a feeling - such libraries will continue popping up.

  • db-benchmark

    reproducible benchmark of database-like ops

    And more benchmarks: https://h2oai.github.io/db-benchmark/. If you are looking for performant dataframes, ideomatic polars typically tops the benchmarks.

  • WorkOS

    The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts