SaaSHub helps you find the best software and product alternatives Learn more →
Top 12 Python apache-spark Projects
-
WorkOS
The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
-
covid-19-data-engineering-pipeline
A Covid-19 data pipeline on AWS featuring PySpark/Glue, Docker, Great Expectations, Airflow, and Redshift, templated in CloudFormation and CDK, deployable via Github Actions.
-
Traffic-Data-Analysis-with-Apache-Spark-Based-on-Mobile-Robot-Data
Mobile robot data were analyzed with Apache-Spark to extract five different statistical result such as travel time, waiting time, average speed, occupancy and density were produced.
-
xonai-dashboard
A Grafana-based application to assist Big Data infrastructure optimization initiatives where Spark applications are a dominant cost driver
-
transactional-datalake-using-amazon-msk-and-apache-iceberg-on-aws-glue
Stream CDC into an Amazon S3 data lake in Apache Iceberg format with AWS Glue Streaming using Amazon MSK and MSK Connect (Debezium)
MLflow is an open-source platform for managing the end-to-end machine learning lifecycle. It includes features for experiment tracking, model versioning, and deployment, enabling developers to track and compare experiments, package models into reproducible runs, and manage model deployment across multiple environments.
Project mention: Show HN: Open sourcing a Big Data monitoring tool | news.ycombinator.com | 2024-03-29
transactional-datalake-using-amazon-msk-serverless-and-apache-iceberg-on-aws-glue 2024-01-10T01:26:56Z https://github.com/aws-samples/transactional-datalake-using-amazon-msk-serverless-and-apache-iceberg-on-aws-glue aws-msk-serverless-cdc-data-pipeline-with-debezium 2024-01-09T01:03:38Z https://github.com/aws-samples/aws-msk-serverless-cdc-data-pipeline-with-debezium aws-healthlake-smart-on-fhir 2024-01-08T23:05:17Z https://github.com/aws-samples/aws-healthlake-smart-on-fhir aws-greengrass-custom-components 2024-01-08T11:34:12Z https://github.com/aws-samples/aws-greengrass-custom-components graviton-developer-workshop 2024-01-08T03:30:31Z https://github.com/aws-samples/graviton-developer-workshop msk-flink-streaming-cdk 2024-01-08T02:25:39Z https://github.com/aws-samples/msk-flink-streaming-cdk rag-with-amazon-postgresql-using-pgvector 2024-01-06T04:47:41Z https://github.com/aws-samples/rag-with-amazon-postgresql-using-pgvector queueTransfer_ContactTraceRecordSupport-for-Service-Cloud-Voice 2024-01-05T20:34:14Z https://github.com/aws-samples/queueTransfer_ContactTraceRecordSupport-for-Service-Cloud-Voice amazon-chime-sdk-voice-voice-translator 2024-01-05T17:25:54Z https://github.com/aws-samples/amazon-chime-sdk-voice-voice-translator private-s3-vpce 2024-01-05T06:38:52Z https://github.com/aws-samples/private-s3-vpce bedrock-contact-center-tasks-eval 2024-01-04T21:46:51Z https://github.com/aws-samples/bedrock-contact-center-tasks-eval clickstream-sdk-samples 2024-01-04T07:21:52Z https://github.com/aws-samples/clickstream-sdk-samples aws-msk-cdc-data-pipeline-with-debezium 2024-01-04T04:09:22Z https://github.com/aws-samples/aws-msk-cdc-data-pipeline-with-debezium transactional-datalake-using-amazon-msk-and-apache-iceberg-on-aws-glue 2024-01-04T03:39:04Z https://github.com/aws-samples/transactional-datalake-using-amazon-msk-and-apache-iceberg-on-aws-glue ..
Python apache-spark related posts
- Explain me how websites like Dall-E, chatgpt, thispersondoesntexit process the user data so quickly
- [D] What licensed software do you use for machine learning experimentation tracking?
- [Q] Is there a tool to keep track of my ML experiments?
- Remote file access vulnerability in `mlflow server` and `mlflow ui` CLIs
- Critical CVE in `mlflow` 2.2.0 and under: Remote file access vulnerability in `mlflow server` and `mlflow ui` CLIs; possible lateral movement into aws creds
- Critical remote unauthenticated system/cloud takeover in major AI tool
- Brainstorming functions to make PySpark easier
-
A note from our sponsor - SaaSHub
www.saashub.com | 25 Apr 2024
Index
What are some of the best open-source apache-spark projects in Python? This list will help you:
Project | Stars | |
---|---|---|
1 | MLflow | 17,234 |
2 | flintrock | 630 |
3 | quinn | 576 |
4 | PySpark-Boilerplate | 390 |
5 | sparktorch | 334 |
6 | dataproc-templates | 110 |
7 | Apache-Spark-Guide | 26 |
8 | covid-19-data-engineering-pipeline | 22 |
9 | Traffic-Data-Analysis-with-Apache-Spark-Based-on-Mobile-Robot-Data | 10 |
10 | xonai-dashboard | 10 |
11 | livyc | 3 |
12 | transactional-datalake-using-amazon-msk-and-apache-iceberg-on-aws-glue | 1 |
Sponsored