Experience setting up Spark and Hudi on Kubernetes

This page summarizes the projects mentioned and recommended in the original post on /r/dataengineering

Our great sponsors
  • Onboard AI - Learn any GitHub repo in 59 seconds
  • SonarQube - Static code analysis for 29 languages.
  • InfluxDB - Collect and Analyze Billions of Data Points in Real Time
  • Revelo Payroll - Free Global Payroll designed for tech teams
  • charts

    Bitnami Helm Charts (by bitnami)

    We're using https://github.com/bitnami/charts/tree/main/bitnami/spark, but I have heard good things about https://github.com/GoogleCloudPlatform/spark-on-k8s-operator as well. Hudi should not need any long running deployments as per the docs https://hudi.apache.org/docs/0.5.1/deployment/#deploying

  • spark-on-k8s-operator

    Kubernetes operator for managing the lifecycle of Apache Spark applications on Kubernetes.

    We're using https://github.com/bitnami/charts/tree/main/bitnami/spark, but I have heard good things about https://github.com/GoogleCloudPlatform/spark-on-k8s-operator as well. Hudi should not need any long running deployments as per the docs https://hudi.apache.org/docs/0.5.1/deployment/#deploying

  • Onboard AI

    Learn any GitHub repo in 59 seconds. Onboard AI learns any GitHub repo in minutes and lets you chat with it to locate functionality, understand different parts, and generate new code. Use it for free at www.getonboard.dev.

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts