What steps do you apply in the "L" when doing ELT?

This page summarizes the projects mentioned and recommended in the original post on /r/dataengineering

Our great sponsors
  • InfluxDB - Power Real-Time Data Analytics at Scale
  • WorkOS - The modern identity platform for B2B SaaS
  • SaaSHub - Software Alternatives and Reviews
  • Scrapy

    Scrapy, a fast high-level web crawling & scraping framework for Python.

  • The sha256 is there establish the uniqueness of the file. It isn’t great for capturing whether or not you have already seen the file before, tho, because it is rather expensive to calculate (imagine your csv file were gigabytes on size — you would have to stream in whole file down in order to see if it had changed!). In the past I have used a sha256 of information that the server hosting the file gives me about the file — it’s last modification time, for example, and it’s file size. This is a surprisingly subtle question, though, so I would encourage you to ponder it deeply, as it is the hardest part of the ‘E’ step! You might consider seeing how a framework like scrapy (https://scrapy.org/) solves this particular thorny issue :)

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts