5 Data Sources for Data Engineering Projects

This page summarizes the projects mentioned and recommended in the original post on dev.to

InfluxDB - Power Real-Time Data Analytics at Scale
Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
www.influxdata.com
featured
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com
featured
  • SF-EvictionTracker

    Tracking and measuring neighborhood and district-level eviction rates in the city of San Francisco.

  • The first data source comes from San Francisco Open Data's API. The local San Francisco government has done a tremendous job of tracking data from a large variety of publishing departments including Treasurer-Tax Collector, Airport (SFO), and the Municipal Transportation Agency, to name a few. An apt data engineering application of this data source was outlined by Ilya Galperin in which eviction trends were tracked by district, filing reason, neighborhood, and demographic.

  • Zillow-Data-Engineering

  • An example of these APIs being implemented into a data engineering pipeline can be found on GitHub. The developer of this repository created a model pipeline that utilizes both historical and current market data to determine the potential return that a local region would yield from a real estate investment. Listed below is the general architecture of the author's model:

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • DataEngineeringProject

    Example end to end data engineering project.

  • Lastly, the most readily available data source would be data scraped from the internet. To be slightly less vague, I have outlined a project that web-scrapes new online articles every ten minutes to provide all the latest news curated into one place. This project utilizes a wide variety of relevant data engineering tools, which makes it a great project example. The author of this project is Damian Kliƛ, and he outlines his model architecture below:

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts

  • Is it me or are beginner-friendly ETL pipeline guides that explain from the ground-up how to incorporate the use of various technologies notoriously difficult to find.

    1 project | /r/dataengineering | 23 Jul 2021
  • Starting A Data Engineering Project Series

    1 project | /r/dataengineering | 7 Jun 2021
  • Can You Recommend Good Data Engineering Projects

    1 project | /r/dataengineering | 18 Feb 2021
  • Migrate mongodb Datawarehouse to snowflake

    1 project | /r/snowflake | 4 Dec 2023
  • Preventing replication slot overflow on Postgres DB (AWS RDS)

    1 project | news.ycombinator.com | 11 Sep 2023