spark_app_twitter
astro
spark_app_twitter | astro | |
---|---|---|
3 | 2 | |
60 | 183 | |
- | - | |
0.0 | 10.0 | |
almost 2 years ago | over 1 year ago | |
Python | Python | |
- | Apache License 2.0 |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
spark_app_twitter
-
Trying to dockerize an all python data engineering project
You can see the structure of everything in my repository: https://github.com/jmcmt87/spark_app_twitter
- GitHub - jmcmt87/spark_app_twitter: A data engineering project (Twitter monitor app)
-
Portfolio Review: I'd like to start my career as a data engineer
I made this project on my own as a portfolio and I'd really appreciate any feedback or advice: https://github.com/jmcmt87/spark_app_twitter
astro
-
After Airflow. Where next for DE?
What I would suggest is if you want an "Airflow 3.0" feel you check out the Astro SDK. My team and I basically spent a year and a half rewriting the Airflow DAG writing experience from the ground up. Completely different feel, highly scalable SQL/python/spark (soon) workflows that basically feel like native python. Way easier to test as well. You can pass dataframes into SQL queries, load data from any supported source to any supported warehouse, and things like lineage are natively supported :)
What are some alternatives?
Traffic-Data-Analysis-with-Apache-Spark-Based-on-Mobile-Robot-Data - Mobile robot data were analyzed with Apache-Spark to extract five different statistical result such as travel time, waiting time, average speed, occupancy and density were produced.
astro-sdk - Astro SDK allows rapid and clean development of {Extract, Load, Transform} workflows using Python and SQL, powered by Apache Airflow.
ibis - the portable Python dataframe library
airflow-maintenance-dags - A series of DAGs/Workflows to help maintain the operation of Airflow
DataEngineeringProject - Example end to end data engineering project.
Mage - 🧙 The modern replacement for Airflow. Mage is an open-source data pipeline tool for transforming and integrating data. https://github.com/mage-ai/mage-ai
portfolio_computerVision - Some of my projects on computer vision
getting-started - This repository is a getting started guide to Singer.
DataScience_portfolio - This is my data science portfolio
sqlelf - Explore ELF objects through the power of SQL
typhoon-orchestrator - Create elegant data pipelines and deploy to AWS Lambda or Airflow