spark_app_twitter
DataEngineeringProject
spark_app_twitter | DataEngineeringProject | |
---|---|---|
3 | 5 | |
60 | 985 | |
- | - | |
0.0 | 0.0 | |
almost 2 years ago | over 1 year ago | |
Python | Python | |
- | MIT License |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
spark_app_twitter
-
Trying to dockerize an all python data engineering project
You can see the structure of everything in my repository: https://github.com/jmcmt87/spark_app_twitter
- GitHub - jmcmt87/spark_app_twitter: A data engineering project (Twitter monitor app)
-
Portfolio Review: I'd like to start my career as a data engineer
I made this project on my own as a portfolio and I'd really appreciate any feedback or advice: https://github.com/jmcmt87/spark_app_twitter
DataEngineeringProject
- What are your favourite GitHub repos that shows how data engineering should be done?
- Is it me or are beginner-friendly ETL pipeline guides that explain from the ground-up how to incorporate the use of various technologies notoriously difficult to find.
-
Starting A Data Engineering Project Series
News RSS Feeds
-
5 Data Sources for Data Engineering Projects
Lastly, the most readily available data source would be data scraped from the internet. To be slightly less vague, I have outlined a project that web-scrapes new online articles every ten minutes to provide all the latest news curated into one place. This project utilizes a wide variety of relevant data engineering tools, which makes it a great project example. The author of this project is Damian Kliś, and he outlines his model architecture below:
-
Can You Recommend Good Data Engineering Projects
Here is my project that got me a few interviews so far: https://github.com/damklis/DataEngineeringProject
What are some alternatives?
Traffic-Data-Analysis-with-Apache-Spark-Based-on-Mobile-Robot-Data - Mobile robot data were analyzed with Apache-Spark to extract five different statistical result such as travel time, waiting time, average speed, occupancy and density were produced.
blinkist-scraper - 📚 Python tool to download book summaries and audio from Blinkist.com, and generate some pretty output
astro - Astro SDK allows rapid and clean development of {Extract, Load, Transform} workflows using Python and SQL, powered by Apache Airflow. [Moved to: https://github.com/astronomer/astro-sdk]
synapse-s3-storage-provider - Synapse storage provider to fetch and store media in Amazon S3
ibis - the portable Python dataframe library
yaetos - Write data & AI pipelines in (SQL, Spark, Pandas) and deploy to the cloud, simplified
portfolio_computerVision - Some of my projects on computer vision
amazon-s3-find-and-forget - Amazon S3 Find and Forget is a solution to handle data erasure requests from data lakes stored on Amazon S3, for example, pursuant to the European General Data Protection Regulation (GDPR)
DataScience_portfolio - This is my data science portfolio
Zillow-Data-Engineering
astro-sdk - Astro SDK allows rapid and clean development of {Extract, Load, Transform} workflows using Python and SQL, powered by Apache Airflow.
openwisp-monitoring - Network monitoring system written in Python and Django, designed to be extensible, programmable, scalable and easy to use by end users: once the system is configured, monitoring checks, alerts and metric collection happens automatically.