DataEngineeringProject
analytics
DataEngineeringProject | analytics | |
---|---|---|
5 | 15 | |
985 | - | |
- | - | |
0.0 | - | |
over 1 year ago | - | |
Python | ||
MIT License | - |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
DataEngineeringProject
- What are your favourite GitHub repos that shows how data engineering should be done?
- Is it me or are beginner-friendly ETL pipeline guides that explain from the ground-up how to incorporate the use of various technologies notoriously difficult to find.
-
Starting A Data Engineering Project Series
News RSS Feeds
-
5 Data Sources for Data Engineering Projects
Lastly, the most readily available data source would be data scraped from the internet. To be slightly less vague, I have outlined a project that web-scrapes new online articles every ten minutes to provide all the latest news curated into one place. This project utilizes a wide variety of relevant data engineering tools, which makes it a great project example. The author of this project is Damian Kliś, and he outlines his model architecture below:
-
Can You Recommend Good Data Engineering Projects
Here is my project that got me a few interviews so far: https://github.com/damklis/DataEngineeringProject
analytics
-
I'm not getting it...what's the point of DBT?
Take a look at gitlab's dbt project: https://gitlab.com/gitlab-data/analytics/-/blob/master/transform/snowflake-dbt/models/common/schema.yml
-
How would you structure a repo with 10+ ETL pipelines and shared code?
A good reference is the Gitlab data team repo. https://gitlab.com/gitlab-data/analytics
- What are your favourite GitHub repos that shows how data engineering should be done?
-
Are there any open corporate Data Team repositories / projects besides GitLab?
For example, their Data Team have a public repository, with a bunch of information on how they organize DAGs, machine learning projects, system configuration, etc.
- Kimball Dim Modelling Code Examples
- Can someone help me, an absolute newbie, understand the usage and benefit of dbt with practical example ?
-
Is jinja templating right for DBT?
So I've run through the DBT tutorial stuff and looked over some fairly complex uses of it i.e. GitLab Data and I was wondering if anyone has any opinions or insights into the use of jinja templating in the sql?
-
Where can I find free data engineering ( big data) projects online?
Gitlab has their DBT repo open source and is very useful for seeing how to structure a project at scale. https://gitlab.com/gitlab-data/analytics/-/tree/master/transform/snowflake-dbt
-
Gitlab's Data Team Platform (in depth look at their stack)
Currently the team is working hard on this: https://gitlab.com/gitlab-data/analytics/-/issues/9508
-
Can someone explain the big deal with dbt?
GitLab's dbt project is an excellent example of a mature project at scale. They also have a comprehensive guide to their methodology.
What are some alternatives?
blinkist-scraper - 📚 Python tool to download book summaries and audio from Blinkist.com, and generate some pretty output
dbt-synapse - dbt adapter for Azure Synapse Dedicated SQL Pools
synapse-s3-storage-provider - Synapse storage provider to fetch and store media in Amazon S3
dagster - An orchestration platform for the development, production, and observation of data assets.
yaetos - Write data & AI pipelines in (SQL, Spark, Pandas) and deploy to the cloud, simplified
castled - Castled is an open source reverse ETL solution that helps you to periodically sync the data in your db/warehouse into sales, marketing, support or custom apps without any help from engineering teams
amazon-s3-find-and-forget - Amazon S3 Find and Forget is a solution to handle data erasure requests from data lakes stored on Amazon S3, for example, pursuant to the European General Data Protection Regulation (GDPR)
datahub - The Metadata Platform for your Data Stack
Zillow-Data-Engineering
AdvancedSQLPuzzles - Welcome to my GitHub repository. I hope you enjoy solving these puzzles as much as I have enjoyed creating them.
openwisp-monitoring - Network monitoring system written in Python and Django, designed to be extensible, programmable, scalable and easy to use by end users: once the system is configured, monitoring checks, alerts and metric collection happens automatically.
lightdash - Self-serve BI to 10x your data team ⚡️