Python apache-spark

Open-source Python projects categorized as apache-spark | Edit details

Top 5 Python apache-spark Projects

  • MLflow

    Open source platform for the machine learning lifecycle

    Project mention: [D] Tips for ML workflow on raw data | reddit.com/r/MachineLearning | 2022-01-21
  • flintrock

    A command-line tool for launching Apache Spark clusters.

    Project mention: Why Databricks Is Winning | news.ycombinator.com | 2021-02-14

    > * AWS has a managed Spark offering called EMR

    There is also my rinky-dink open source project, Flintrock [0], that will launch open source Spark clusters on AWS for you.

    It's probably not the right tool for production use (and you would be right to wonder why Flintrock exists when we have EMR [1]), but I know of several companies that have used Flintrock at one point or other in production at large scale (like, 400+ node clusters).

    [0]: https://github.com/nchammas/flintrock

    [1]: https://github.com/nchammas/flintrock#why-build-flintrock-wh...

  • Scout APM

    Less time debugging, more time building. Scout APM allows you to find and fix performance issues with no hassle. Now with error monitoring and external services monitoring, Scout is a developer's best friend when it comes to application development.

  • PySpark-Boilerplate

    A boilerplate for writing PySpark Jobs

    Project mention: Packaging Pyspark Applications | reddit.com/r/apachespark | 2021-07-15
  • quinn

    pyspark methods to enhance developer productivity 📣 👯 🎉 (by MrPowers)

    Project mention: Pyspark now provides a native Pandas API | reddit.com/r/Python | 2022-01-02

    Pandas syntax is far inferior to regular PySpark in my opinion. Goes to show how much data analysts value a syntax that they're already familiar with. Pandas syntax makes it harder to reason about queries, abstract DataFrame transformations, etc. I've authored some popular PySpark libraries like quinn and chispa and am not excited to add Pandas syntax support, haha.

  • sparktorch

    Train and run Pytorch models on Apache Spark.

    Project mention: Spark2 + pytorch on GPU | reddit.com/r/pytorch | 2021-09-17

    Was reading the documentation of sparktorch (https://github.com/dmmiller612/sparktorch) which says you need spark >= 2.4.4. But to the best of my knowledge spark2 doesn't have gpu compute capabilities. Does that mean it can only use cpu compute? Am I missing something?

NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020). The latest post mention was on 2022-01-21.

Python apache-spark related posts

Index

What are some of the best open-source apache-spark projects in Python? This list will help you:

Project Stars
1 MLflow 11,127
2 flintrock 589
3 PySpark-Boilerplate 355
4 quinn 305
5 sparktorch 234
Find remote jobs at our new job board 99remotejobs.com. There are 30 new remote jobs listed recently.
Are you hiring? Post a new remote job listing for free.
OPS - Build and Run Open Source Unikernels
Quickly and easily build and deploy open source unikernels in tens of seconds. Deploy in any language to any cloud.
github.com/nanovms