Pyspark now provides a native Pandas API

This page summarizes the projects mentioned and recommended in the original post on /r/Python

SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com
featured
  • fugue

    A unified interface for distributed computing. Fugue executes SQL, Python, Pandas, and Polars code on Spark, Dask and Ray without any rewrites.

    There's dask-sql, but I think it is being abandoned for fugue-project. I'm actually excited for this project as it is trying to provide a backend agnostic solution, which would seem like a difficult, lofty goal. I wish them luck.

  • SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

    SaaSHub logo
  • quinn

    pyspark methods to enhance developer productivity 📣 👯 🎉 (by mrpowers-io)

    Pandas syntax is far inferior to regular PySpark in my opinion. Goes to show how much data analysts value a syntax that they're already familiar with. Pandas syntax makes it harder to reason about queries, abstract DataFrame transformations, etc. I've authored some popular PySpark libraries like quinn and chispa and am not excited to add Pandas syntax support, haha.

  • chispa

    PySpark test helper methods with beautiful error messages

    Pandas syntax is far inferior to regular PySpark in my opinion. Goes to show how much data analysts value a syntax that they're already familiar with. Pandas syntax makes it harder to reason about queries, abstract DataFrame transformations, etc. I've authored some popular PySpark libraries like quinn and chispa and am not excited to add Pandas syntax support, haha.

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts

  • Why Databricks Is Winning

    5 projects | news.ycombinator.com | 14 Feb 2021
  • What are your favorite Apache Spark open source libraries?

    2 projects | /r/apachespark | 23 Feb 2023
  • Invitation to collaborate on open source PySpark projects

    3 projects | /r/apachespark | 15 Oct 2022
  • Show dataengineering: beavis, a library for unit testing Pandas/Dask code

    3 projects | /r/dataengineering | 9 Aug 2021
  • Is Spark - The Defenitive Guide outdated?

    2 projects | /r/apachespark | 1 Jul 2021