Revolutionize your code reviews with AI. CodeRabbit offers PR summaries, code walkthroughs, 1-click suggestions, and AST-based analysis. Boost productivity and code quality across all major languages with each PR. Learn more →
Top 15 Python apache-spark Projects
-
Project mention: Future AI Deployment: Automating Full Lifecycle Management with Rollback Strategies and Cloud Migration | dev.to | 2025-03-15
AI Model Lifecycle Management:MLflow Documentation
-
Judoscale
Save 47% on cloud hosting with autoscaling that just works. Judoscale integrates with Django, FastAPI, Celery, and RQ to make autoscaling easy and reliable. Save big, and say goodbye to request timeouts and backed-up task queues.
-
-
-
-
-
Project mention: Show HN: Pyper – Concurrent Python Made Simple | news.ycombinator.com | 2025-01-12
-
-
InfluxDB
InfluxDB high-performance time series database. Collect, organize, and act on massive volumes of high-resolution data to power real-time intelligent systems.
-
-
-
covid-19-data-engineering-pipeline
A Covid-19 data pipeline on AWS featuring PySpark/Glue, Docker, Great Expectations, Airflow, and Redshift, templated in CloudFormation and CDK, deployable via Github Actions.
-
e2e-structured-streaming
End-to-end data pipeline that ingests, processes, and stores data. It uses Apache Airflow to schedule scripts that fetch data from an API, sends the data to Kafka, and processes it with Spark before writing to Cassandra. The pipeline, built with Python and Apache Zookeeper, is containerized with Docker for easy deployment and scalability.
$ git clone https://github.com/akarce/e2e-structured-streaming.git
-
xonai-dashboard
A Grafana-based application to assist Big Data infrastructure optimization initiatives where Spark applications are a dominant cost driver
-
Traffic-Data-Analysis-with-Apache-Spark-Based-on-Mobile-Robot-Data
Mobile robot data were analyzed with Apache-Spark to extract five different statistical result such as travel time, waiting time, average speed, occupancy and density were produced.
-
transactional-datalake-using-amazon-msk-and-apache-iceberg-on-aws-glue
Stream CDC into an Amazon S3 data lake in Apache Iceberg format with AWS Glue Streaming using Amazon MSK and MSK Connect (Debezium)
-
-
CodeRabbit
CodeRabbit: AI Code Reviews for Developers. Revolutionize your code reviews with AI. CodeRabbit offers PR summaries, code walkthroughs, 1-click suggestions, and AST-based analysis. Boost productivity and code quality across all major languages with each PR.
Python apache-spark discussion
Python apache-spark related posts
-
How to Use KitOps with MLflow
-
Mlflow: Open-source platform for the machine learning lifecycle
-
Observations on MLOps–A Fragmented Mosaic of Mismatched Expectations
-
Explain me how websites like Dall-E, chatgpt, thispersondoesntexit process the user data so quickly
-
[D] What licensed software do you use for machine learning experimentation tracking?
-
[Q] Is there a tool to keep track of my ML experiments?
-
Remote file access vulnerability in `mlflow server` and `mlflow ui` CLIs
-
A note from our sponsor - CodeRabbit
coderabbit.ai | 24 Apr 2025
Index
What are some of the best open-source apache-spark projects in Python? This list will help you:
# | Project | Stars |
---|---|---|
1 | MLflow | 20,230 |
2 | quinn | 670 |
3 | flintrock | 642 |
4 | PySpark-Boilerplate | 396 |
5 | sparktorch | 340 |
6 | pysparkling | 268 |
7 | dataproc-templates | 127 |
8 | pyjaws | 43 |
9 | Apache-Spark-Guide | 30 |
10 | covid-19-data-engineering-pipeline | 23 |
11 | e2e-structured-streaming | 18 |
12 | xonai-dashboard | 14 |
13 | Traffic-Data-Analysis-with-Apache-Spark-Based-on-Mobile-Robot-Data | 12 |
14 | transactional-datalake-using-amazon-msk-and-apache-iceberg-on-aws-glue | 5 |
15 | livyc | 3 |