Spark open source community is awesome

This page summarizes the projects mentioned and recommended in the original post on reddit.com/r/apachespark

Our great sponsors
  • Sonar - Write Clean Python Code. Always.
  • InfluxDB - Access the most powerful time series database as a service
  • SaaSHub - Software Alternatives and Reviews
  • mack

    Delta Lake helper methods in PySpark

    a couple devs just added a `find_compositite_keys_candidates` function so users can easily identify columns that could be used as a unique identifier in their Delta table.

  • jodie

    Delta lake and filesystem helper methods (by MrPowers)

    another dev is working on adding an elegant interface to perform Hadoop filesystem operations, similar to os-lib for regular filesystem operations

  • Sonar

    Write Clean Python Code. Always.. Sonar helps you commit clean code every time. With over 225 unique rules to find Python bugs, code smells & vulnerabilities, Sonar finds the issues while you focus on the work.

  • os-lib

    OS-Lib is a simple, flexible, high-performance Scala interface to common OS filesystem and subprocess APIs

    another dev is working on adding an elegant interface to perform Hadoop filesystem operations, similar to os-lib for regular filesystem operations

  • chispa

    PySpark test helper methods with beautiful error messages

    here's a little README fix a user pushed to chispa

  • delta-rs

    A native Rust library for Delta Lake, with bindings into Python

    Yea, there are tons of employees from companies that have made massive contributions to the Spark ecosystem. Apple built Delta Lake with Databricks, see this video for more detail. Lots of Spark PMCs are from various companies. delta-rs was initially built by Scribd and is now actively maintained by engineers at Voltron & other companies. It's awesome the community has so many contributors from various sources.

  • InfluxDB

    Access the most powerful time series database as a service. Ingest, store, & analyze all types of time series data in a fully-managed, purpose-built database. Keep data forever with low-cost storage and superior data compression.

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts