Python apache-spark

Open-source Python projects categorized as apache-spark

Top 12 Python apache-spark Projects

  • MLflow

    Open source platform for the machine learning lifecycle

  • Project mention: My Favorite DevTools to Build AI/ML Applications! | dev.to | 2024-04-23

    MLflow is an open-source platform for managing the end-to-end machine learning lifecycle. It includes features for experiment tracking, model versioning, and deployment, enabling developers to track and compare experiments, package models into reproducible runs, and manage model deployment across multiple environments.

  • flintrock

    A command-line tool for launching Apache Spark clusters.

  • WorkOS

    The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.

    WorkOS logo
  • quinn

    pyspark methods to enhance developer productivity 📣 👯 🎉 (by MrPowers)

  • PySpark-Boilerplate

    A boilerplate for writing PySpark Jobs

  • sparktorch

    Train and run Pytorch models on Apache Spark.

  • dataproc-templates

    Dataproc templates and pipelines for solving simple in-cloud data tasks

  • Apache-Spark-Guide

    Apache Spark Guide

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • covid-19-data-engineering-pipeline

    A Covid-19 data pipeline on AWS featuring PySpark/Glue, Docker, Great Expectations, Airflow, and Redshift, templated in CloudFormation and CDK, deployable via Github Actions.

  • Traffic-Data-Analysis-with-Apache-Spark-Based-on-Mobile-Robot-Data

    Mobile robot data were analyzed with Apache-Spark to extract five different statistical result such as travel time, waiting time, average speed, occupancy and density were produced.

  • xonai-dashboard

    A Grafana-based application to assist Big Data infrastructure optimization initiatives where Spark applications are a dominant cost driver

  • Project mention: Show HN: Open sourcing a Big Data monitoring tool | news.ycombinator.com | 2024-03-29
  • livyc

    Apache Spark as a Service with Apache Livy Client

  • transactional-datalake-using-amazon-msk-and-apache-iceberg-on-aws-glue

    Stream CDC into an Amazon S3 data lake in Apache Iceberg format with AWS Glue Streaming using Amazon MSK and MSK Connect (Debezium)

  • Project mention: Writing simple Python scripts faster with Amazon Q | dev.to | 2024-01-11

    transactional-datalake-using-amazon-msk-serverless-and-apache-iceberg-on-aws-glue 2024-01-10T01:26:56Z https://github.com/aws-samples/transactional-datalake-using-amazon-msk-serverless-and-apache-iceberg-on-aws-glue aws-msk-serverless-cdc-data-pipeline-with-debezium 2024-01-09T01:03:38Z https://github.com/aws-samples/aws-msk-serverless-cdc-data-pipeline-with-debezium aws-healthlake-smart-on-fhir 2024-01-08T23:05:17Z https://github.com/aws-samples/aws-healthlake-smart-on-fhir aws-greengrass-custom-components 2024-01-08T11:34:12Z https://github.com/aws-samples/aws-greengrass-custom-components graviton-developer-workshop 2024-01-08T03:30:31Z https://github.com/aws-samples/graviton-developer-workshop msk-flink-streaming-cdk 2024-01-08T02:25:39Z https://github.com/aws-samples/msk-flink-streaming-cdk rag-with-amazon-postgresql-using-pgvector 2024-01-06T04:47:41Z https://github.com/aws-samples/rag-with-amazon-postgresql-using-pgvector queueTransfer_ContactTraceRecordSupport-for-Service-Cloud-Voice 2024-01-05T20:34:14Z https://github.com/aws-samples/queueTransfer_ContactTraceRecordSupport-for-Service-Cloud-Voice amazon-chime-sdk-voice-voice-translator 2024-01-05T17:25:54Z https://github.com/aws-samples/amazon-chime-sdk-voice-voice-translator private-s3-vpce 2024-01-05T06:38:52Z https://github.com/aws-samples/private-s3-vpce bedrock-contact-center-tasks-eval 2024-01-04T21:46:51Z https://github.com/aws-samples/bedrock-contact-center-tasks-eval clickstream-sdk-samples 2024-01-04T07:21:52Z https://github.com/aws-samples/clickstream-sdk-samples aws-msk-cdc-data-pipeline-with-debezium 2024-01-04T04:09:22Z https://github.com/aws-samples/aws-msk-cdc-data-pipeline-with-debezium transactional-datalake-using-amazon-msk-and-apache-iceberg-on-aws-glue 2024-01-04T03:39:04Z https://github.com/aws-samples/transactional-datalake-using-amazon-msk-and-apache-iceberg-on-aws-glue ..

NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020).

Python apache-spark related posts

Index

What are some of the best open-source apache-spark projects in Python? This list will help you:

Project Stars
1 MLflow 17,234
2 flintrock 630
3 quinn 576
4 PySpark-Boilerplate 390
5 sparktorch 334
6 dataproc-templates 110
7 Apache-Spark-Guide 26
8 covid-19-data-engineering-pipeline 22
9 Traffic-Data-Analysis-with-Apache-Spark-Based-on-Mobile-Robot-Data 10
10 xonai-dashboard 10
11 livyc 3
12 transactional-datalake-using-amazon-msk-and-apache-iceberg-on-aws-glue 1

Sponsored
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com