Python apache-spark

Open-source Python projects categorized as apache-spark

Top 9 Python apache-spark Projects

  • MLflow

    Open source platform for the machine learning lifecycle

    Project mention: ML experiment tracking with DagsHub, MLFlow, and DVC | | 2023-01-12

    Here, we’ll implement the experimentation workflow using DagsHub, Google Colab, MLflow, and data version control (DVC). We’ll focus on how to do this without diving deep into the technicalities of building or designing a workbench from scratch. Going that route might increase the complexity involved, especially if you are in the early stages of understanding ML workflows, just working on a small project, or trying to implement a proof of concept.

  • flintrock

    A command-line tool for launching Apache Spark clusters.

  • Sonar

    Write Clean Python Code. Always.. Sonar helps you commit clean code every time. With over 225 unique rules to find Python bugs, code smells & vulnerabilities, Sonar finds the issues while you focus on the work.

  • quinn

    pyspark methods to enhance developer productivity 📣 👯 🎉 (by MrPowers)

    Project mention: Invitation to collaborate on open source PySpark projects | | 2022-10-15

    quinn is a library with PySpark helper functions. I need to work through all the open issues / PRs and bump all versions. I should do another release. This library gets around 600,000 monthly downloads.

  • PySpark-Boilerplate

    A boilerplate for writing PySpark Jobs

  • sparktorch

    Train and run Pytorch models on Apache Spark.

  • Apache-Spark-Guide

    Apache Spark Guide

    Project mention: Useful Tools and Programs list for Apache Spark | | 2022-03-20
  • Traffic-Data-Analysis-with-Apache-Spark-Based-on-Mobile-Robot-Data

    Mobile robot data were analyzed with Apache-Spark to extract five different statistical result such as travel time, waiting time, average speed, occupancy and density were produced.

    Project mention: Traffic Data Analysis with Apache Spark Based on Autonomous Transport Vehicle Data | | 2022-04-05

    You can access all project on my github repo.

  • InfluxDB

    Build time-series-based applications quickly and at scale.. InfluxDB is the Time Series Platform where developers build real-time applications for analytics, IoT and cloud-native services. Easy to start, it is available in the cloud or on-premises.

  • livyc

    Apache Livy Client

    Project mention: Wittline/livyc: Apache Livy Client | | 2022-06-30
  • Patek

    A collection of reusable pyspark utility functions that help make development easier!

    Project mention: Implement dynamic merge in PySpark. | | 2022-12-13

    I developed a function to do this! Check it out:

NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020). The latest post mention was on 2023-01-12.

Python apache-spark related posts


What are some of the best open-source apache-spark projects in Python? This list will help you:

Project Stars
1 MLflow 13,535
2 flintrock 616
3 quinn 392
4 PySpark-Boilerplate 382
5 sparktorch 286
6 Apache-Spark-Guide 14
7 Traffic-Data-Analysis-with-Apache-Spark-Based-on-Mobile-Robot-Data 9
8 livyc 2
9 Patek 0
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives