glossary
data-engineering-wiki
glossary | data-engineering-wiki | |
---|---|---|
5 | 15 | |
90 | 1,042 | |
- | 3.9% | |
4.3 | 7.5 | |
10 months ago | about 2 months ago | |
SCSS | CSS | |
MIT License | Creative Commons Zero v1.0 Universal |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
glossary
- Data Engineering Glossary
- A Single Place for All Data Knowledge
-
Data Engineering Concepts: Definitions, Backlinks, and Graph View
The difference is, the data glossary does not need a paid and closed-source Obsidian Publish for publishing. The data glossary is fully open and uses GoHugo and Quartz.
-
Want to transition to data engineering but are overwhelmed with all the terms?
All of it is open on GitHub. Feels free to add missing terms or ask questions.
data-engineering-wiki
- Data Engineering Glossary
-
ETL practice
My suggestions: 1. Browse https://dataengineering.wiki/ and overall go over r/dataengineering 2. In mid-sized companies, the trend is to outsource Extract and Load to providers like Fivetran or Airbyte (open-source). Then Transform it with dbt in a data warehouse with SQL. 3. In big companies, you won't touch much ETL design. Just need to be proficient in Python / Spark / SQL... 4. Make sure you know what a star schema, fact tables, and dimension tables are.
- Anything else to read
-
Looking for blogs for backend development
Hi everyone! As mentioned in title I recently came across great blogs for data engineering: startdataengineering.com and dataengineering.wiki
-
DE- How to get my foot in the door?
The data engineering subreddit maintains a wiki of advice, resources, and recommendations at https://dataengineering.wiki/. Your question is answered in their FAQ here
- Getting into Data Engineering and more!
-
Are there avenues into sports science as a software engineer or web dev?
Data engineering
-
Switching to something more technical
r/dataengineering has a wiki at https://dataengineering.wiki and also a Discord server which is pretty active.
-
Data Engineering Concepts: Definitions, Backlinks, and Graph View
Almost the same as the wiki https://dataengineering.wiki/
-
dataengineering.wiki Bug
Hi, would you mind opening an issue on GitHub? We can help you debug the issue there.
What are some alternatives?
quartz - 🌱 a fast, batteries-included static-site generator that transforms Markdown content into fully functional websites
Dataplane - Dataplane is a data platform that makes it easy to construct a data mesh with automated data pipelines and workflows.
Hugo - The world’s fastest framework for building websites.
versatile-data-kit - One framework to develop, deploy and operate data workflows with Python and SQL.
applied-ml - 📚 Papers & tech blogs by companies sharing their work on data science & machine learning in production.
superset - Apache Superset is a Data Visualization and Data Exploration Platform
sayn - Data processing and modelling framework for automating tasks (incl. Python & SQL transformations).
Made-With-ML - Learn how to design, develop, deploy and iterate on production-grade ML applications.
Mage - 🧙 The modern replacement for Airflow. Mage is an open-source data pipeline tool for transforming and integrating data. https://github.com/mage-ai/mage-ai
Apache Superset - Apache Superset is a Data Visualization and Data Exploration Platform [Moved to: https://github.com/apache/superset]