What aspects of Python should I learn that are most important for Data Engineering?

This page summarizes the projects mentioned and recommended in the original post on /r/dataengineering

Our great sponsors
  • WorkOS - The modern identity platform for B2B SaaS
  • InfluxDB - Power Real-Time Data Analytics at Scale
  • SaaSHub - Software Alternatives and Reviews
  • mypy

    Optional static typing for Python

  • Python is one of the most accessible programming l code within Python. My favorite is dagster, which forces you to write functional blocks of code with superior features—coming from a more SQL, T-SQL, and PL-SQL background. As a data engineer, I'd say you'd not expect to write perfect code; it's better to know the Big-O annotation to avoid long-running data pipelines, even if your code doesn't look the prettiest. Static types such as mypy might be another good one to know, as it will detect errors pre-runtime, which is the biggest problem of Python.

  • dagster

    An orchestration platform for the development, production, and observation of data assets.

  • Python is one of the most accessible programming l code within Python. My favorite is dagster, which forces you to write functional blocks of code with superior features—coming from a more SQL, T-SQL, and PL-SQL background. As a data engineer, I'd say you'd not expect to write perfect code; it's better to know the Big-O annotation to avoid long-running data pipelines, even if your code doesn't look the prettiest. Static types such as mypy might be another good one to know, as it will detect errors pre-runtime, which is the biggest problem of Python.

  • WorkOS

    The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.

    WorkOS logo
  • Mage

    🧙 The modern replacement for Airflow. Mage is an open-source data pipeline tool for transforming and integrating data. https://github.com/mage-ai/mage-ai

  • Airflow: you can live without. Check out this newer data pipeline tool: https://github.com/mage-ai/mage-ai

  • ticker_selection_BI_dashboard

    Data Engineering Project: 4 shares of a stock data extraction, upload on MySql used to be in a BI project

  • Github Tickers Selection Dashboard LINK

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts