data-engineering-zoomcamp
audiophile-e2e-pipeline
data-engineering-zoomcamp | audiophile-e2e-pipeline | |
---|---|---|
119 | 3 | |
22,811 | 170 | |
3.4% | - | |
9.4 | 0.0 | |
29 days ago | over 1 year ago | |
Jupyter Notebook | Python | |
- | - |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
data-engineering-zoomcamp
-
Data Engineering Zoomcamp Week 6 - using redpanda 1
References: Data engineering zoomcamp week 6 course and homework notes: https://github.com/DataTalksClub/data-engineering-zoomcamp/tree/main/cohorts/2024/06-streaming
-
Final project part 5
dbt is the main part of my data engineering project for Data Talks Club's data engineering zoomcamp. After a few frustrating errors on my part, I finally figured out how to make models, where to put the staging models and where to put the core models, how to compile a seed file, and how to join it to the main file in order to produce data for visualization. I also used the git interface to continually upgrade my repository. This was extremely convenient and helpful.
-
Building a project in DBT
For Week 4 of DataTalksClub's data engineering zoomcamp, we had to install dbt and create a project. This was a formidable task. dbt is a data transformation tool that enables data analysts and engineers to transform data in a cloud analytics warehouse, BigQuery in our case. It took me a very long time to do this, and in this case I needed the homework extension.
-
Testing and documenting DBT models
In this video we learned how to test and document dbt models. We also learned about the codegen library. This is part of Week 4 of the data engineering zoomcamp by DataTalksClub.
-
Extracting data with dlt
If you want to run these commands yourself, either in a Jupyter notebook or in Google Colab, you can get the file from HERE. You can get an overview of the workshop HERE. When I ran in a Jupyter notebook, I had to delete the first line (%%capture) and put quotes around dlt[duckdb] in the second line.
-
Data engineering at home?
Take a look.DE zoomcamp
-
Rockstar Data Engineers making big bucks: what are you doing exactly?
If you need guidance you can attend the data engineering zoomcamp, it's free and quite solid.
-
Self study material
Welcome. Start with Data Engineering Zoomcamp, try and build a project, see if you like it, then continue to get into deeper resources.
-
What is the best way to learn Python if I want to become a data engineer
Can take a look at this - https://github.com/DataTalksClub/data-engineering-zoomcamp
-
Course Recommendations for a New Grad
I think you can start with something free with this pretty practical course on Data Engineering from DataTalksClub - https://github.com/DataTalksClub/data-engineering-zoomcamp
audiophile-e2e-pipeline
- Where can I find online projects end-to-end?
-
Celebrating my first Data Engineering Project -- Fitbit data with PySpark, GCP, prefect, and terraform!
ris-tlp adiophile-e2e-pipeline
- Built and automated a complete end-to-end ELT pipeline using AWS, Airflow, dbt, Terraform, Metabase and more as a beginner project!
What are some alternatives?
mlops-zoomcamp - Free MLOps course from DataTalks.Club
ghcn-d - Data Pipeline from the Global Historical Climatology Network DataSet
Cookbook - The Data Engineering Cookbook
Reddit-API-Pipeline
AdventureWorks - Projects using the AdventureWorks database
data_engineering_project_1 - My first attempt at a rough ETL pipeline; technologies include spark, GCS, prefect orchestration, and terraform
versatile-data-kit - One framework to develop, deploy and operate data workflows with Python and SQL.
stream-iot - An end-to-end workflow for processing streaming data on Azure.
StravaDataPipline - :arrows_counterclockwise: :running: EtLT of my own Strava data using the Strava API, MySQL, Python, S3, Redshift, and Airflow
udacity-capstone
streamify - A data engineering project with Kafka, Spark Streaming, dbt, Docker, Airflow, Terraform, GCP and much more!