pyspark-example-project

Implementing best practices for PySpark ETL jobs and applications. (by AlexIoannides)

Pyspark-example-project Alternatives

Similar projects and alternatives to pyspark-example-project

  1. soda-spark

    Soda Spark is a PySpark library that helps you with testing your data in Spark Dataframes

  2. InfluxDB

    InfluxDB – Built for High-Performance Time Series Workloads. InfluxDB 3 OSS is now GA. Transform, enrich, and act on time series data directly in the database. Automate critical tasks and eliminate the need to move data externally. Download now.

    InfluxDB logo
  3. TypedPyspark

    Type-annotate your spark dataframes and validate them

  4. patterns-devkit

    Data pipelines from re-usable components

  5. workshop-realtime-data-pipelines

    You will inspect and run a sample architecture making use of Apache Pulsar™ and Pulsar Functions for real-time, event-streaming-based data ingestion, cleaning and processing.

  6. dados-censup

    Discontinued Automação da ingestão de dados disponibilizados pelo INEP referente ao censo superior da educacão brasileira.

  7. Sevalla

    Deploy and host your apps and databases, now with $50 credit! Sevalla is the PaaS you have been looking for! Advanced deployment pipelines, usage-based pricing, preview apps, templates, human support by developers, and much more!

    Sevalla logo
  8. hamilton

    25 pyspark-example-project VS hamilton

    Apache Hamilton helps data scientists and engineers define testable, modular, self-documenting dataflows, that encode lineage/tracing and metadata. Runs and scales everywhere python does.

  9. etl-markup-toolkit

    Discontinued ETL Markup Toolkit is a spark-native tool for expressing ETL transformations as configuration

  10. Mage

    🧙 The modern replacement for Airflow. Mage is an open-source data pipeline tool for transforming and integrating data. https://github.com/mage-ai/mage-ai

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a better pyspark-example-project alternative or higher similarity.

pyspark-example-project discussion

Log in or Post with

pyspark-example-project reviews and mentions

Posts with mentions or reviews of pyspark-example-project. We have used some of these posts to build our list of alternatives and similar projects.

Stats

Basic pyspark-example-project repo stats
1
1,944
0.0
over 2 years ago

Sponsored
InfluxDB – Built for High-Performance Time Series Workloads
InfluxDB 3 OSS is now GA. Transform, enrich, and act on time series data directly in the database. Automate critical tasks and eliminate the need to move data externally. Download now.
www.influxdata.com