Reddit-API-Pipeline
streamify
Reddit-API-Pipeline | streamify | |
---|---|---|
7 | 4 | |
271 | 474 | |
- | - | |
0.0 | 0.0 | |
over 1 year ago | about 2 years ago | |
Python | Python | |
- | - |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
Reddit-API-Pipeline
-
Reddit ELT Pipeline
P.S. I followed steps from this repository but made some adjustments: ABZ-Aaron/Reddit-API-Pipeline (github.com)
- Where can I find online projects end-to-end?
- I created a pipeline extracting Reddit data using Airflow, Docker, Terraform, S3, dbt, Redshift, and Google Data Studio
-
Do you think this AWS based personal project would be suitable and complex enough for a resume?
I recently created one using Airflow, Docker, dbt, S3, Redshift, and PowerBI. It's not perfect, and totally overkill with regards to the tools I used, but you can find it here: https://github.com/ABZ-Aaron/Reddit-API-Pipeline
- Is it worth me applying to Data Engineering roles with this Resume/CV?
streamify
- Where can I find online projects end-to-end?
-
Completed my first Data Engineering project with Kafka, Spark, GCP, Airflow, dbt, Terraform, Docker and more!
Here is link number 1 - Previous text "Git"
What are some alternatives?
data-engineering-zoomcamp - Free Data Engineering course!
eventsim - Event data simulator. Generates a stream of pseudo-random events from a set of users, designed to simulate web traffic.
terraform-cdk - Define infrastructure resources using programming constructs and provision them using HashiCorp Terraform
terraform - Terraform enables you to safely and predictably create, change, and improve infrastructure. It is a source-available tool that codifies APIs into declarative configuration files that can be shared amongst team members, treated as code, edited, reviewed, and versioned.
versatile-data-kit - One framework to develop, deploy and operate data workflows with Python and SQL.
Airflow - Apache Airflow - A platform to programmatically author, schedule, and monitor workflows
finnhub-streaming-data-pipeline - Stream processing pipeline from Finnhub websocket using Spark, Kafka, Kubernetes and more
eventsim - Event data simulator. Generates a stream of pseudo-random events from a set of users, designed to simulate web traffic.
audiophile-e2e-pipeline - Pipeline that extracts data from Crinacle's Headphone and InEarMonitor databases and finalizes data for a Metabase Dashboard.
udacity-capstone
tfl-bikes-data-pipeline - Processing TFL data for bike usage with Google Cloud Platform.