Use case for ETL over ELT?

This page summarizes the projects mentioned and recommended in the original post on reddit.com/r/dataengineering

Our great sponsors
  • Sonar - Write Clean Python Code. Always.
  • InfluxDB - Access the most powerful time series database as a service
  • SaaSHub - Software Alternatives and Reviews
  • pyodbc

    Python ODBC bridge

    I use lxml for the XML parsing and pyodbc as the ODBC library. We have a small team so I just keep it as simple as possible: 1. A cursor yields the XML documents from a SQL query as a stream 2. A generator function parses the XML document and yields the rows (you could parallelize this step) 3. Stream each of the resulting rows to a single CSV file 4. Scoop up the resulting CSV file into the target database (usually with the DB engine's loader; bulk insert isn't so fast over ODBC) It ends up being a straight forward, low-overhead approach.

  • lxml

    The lxml XML toolkit for Python

    I use lxml for the XML parsing and pyodbc as the ODBC library. We have a small team so I just keep it as simple as possible: 1. A cursor yields the XML documents from a SQL query as a stream 2. A generator function parses the XML document and yields the rows (you could parallelize this step) 3. Stream each of the resulting rows to a single CSV file 4. Scoop up the resulting CSV file into the target database (usually with the DB engine's loader; bulk insert isn't so fast over ODBC) It ends up being a straight forward, low-overhead approach.

  • Sonar

    Write Clean Python Code. Always.. Sonar helps you commit clean code every time. With over 225 unique rules to find Python bugs, code smells & vulnerabilities, Sonar finds the issues while you focus on the work.

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts