data-engineering-wiki
glossary
data-engineering-wiki | glossary | |
---|---|---|
15 | 5 | |
1,042 | 90 | |
3.9% | - | |
7.5 | 4.3 | |
about 2 months ago | 10 months ago | |
CSS | SCSS | |
Creative Commons Zero v1.0 Universal | MIT License |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
data-engineering-wiki
- Data Engineering Glossary
-
ETL practice
My suggestions: 1. Browse https://dataengineering.wiki/ and overall go over r/dataengineering 2. In mid-sized companies, the trend is to outsource Extract and Load to providers like Fivetran or Airbyte (open-source). Then Transform it with dbt in a data warehouse with SQL. 3. In big companies, you won't touch much ETL design. Just need to be proficient in Python / Spark / SQL... 4. Make sure you know what a star schema, fact tables, and dimension tables are.
- Anything else to read
-
Looking for blogs for backend development
Hi everyone! As mentioned in title I recently came across great blogs for data engineering: startdataengineering.com and dataengineering.wiki
-
DE- How to get my foot in the door?
The data engineering subreddit maintains a wiki of advice, resources, and recommendations at https://dataengineering.wiki/. Your question is answered in their FAQ here
- Getting into Data Engineering and more!
-
Are there avenues into sports science as a software engineer or web dev?
Data engineering
-
Switching to something more technical
r/dataengineering has a wiki at https://dataengineering.wiki and also a Discord server which is pretty active.
-
Data Engineering Concepts: Definitions, Backlinks, and Graph View
Almost the same as the wiki https://dataengineering.wiki/
-
dataengineering.wiki Bug
Hi, would you mind opening an issue on GitHub? We can help you debug the issue there.
glossary
- Data Engineering Glossary
- A Single Place for All Data Knowledge
-
Data Engineering Concepts: Definitions, Backlinks, and Graph View
The difference is, the data glossary does not need a paid and closed-source Obsidian Publish for publishing. The data glossary is fully open and uses GoHugo and Quartz.
-
Want to transition to data engineering but are overwhelmed with all the terms?
All of it is open on GitHub. Feels free to add missing terms or ask questions.
What are some alternatives?
Dataplane - Dataplane is a data platform that makes it easy to construct a data mesh with automated data pipelines and workflows.
quartz - 🌱 a fast, batteries-included static-site generator that transforms Markdown content into fully functional websites
versatile-data-kit - One framework to develop, deploy and operate data workflows with Python and SQL.
Hugo - The world’s fastest framework for building websites.
applied-ml - 📚 Papers & tech blogs by companies sharing their work on data science & machine learning in production.
sayn - Data processing and modelling framework for automating tasks (incl. Python & SQL transformations).
superset - Apache Superset is a Data Visualization and Data Exploration Platform
Mage - 🧙 The modern replacement for Airflow. Mage is an open-source data pipeline tool for transforming and integrating data. https://github.com/mage-ai/mage-ai
Made-With-ML - Learn how to design, develop, deploy and iterate on production-grade ML applications.
Apache Superset - Apache Superset is a Data Visualization and Data Exploration Platform [Moved to: https://github.com/apache/superset]