-
SF-EvictionTracker
Tracking and measuring neighborhood and district-level eviction rates in the city of San Francisco.
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
The first data source comes from San Francisco Open Data's API. The local San Francisco government has done a tremendous job of tracking data from a large variety of publishing departments including Treasurer-Tax Collector, Airport (SFO), and the Municipal Transportation Agency, to name a few. An apt data engineering application of this data source was outlined by Ilya Galperin in which eviction trends were tracked by district, filing reason, neighborhood, and demographic.
An example of these APIs being implemented into a data engineering pipeline can be found on GitHub. The developer of this repository created a model pipeline that utilizes both historical and current market data to determine the potential return that a local region would yield from a real estate investment. Listed below is the general architecture of the author's model:
Lastly, the most readily available data source would be data scraped from the internet. To be slightly less vague, I have outlined a project that web-scrapes new online articles every ten minutes to provide all the latest news curated into one place. This project utilizes a wide variety of relevant data engineering tools, which makes it a great project example. The author of this project is Damian KliĆ, and he outlines his model architecture below:
Related posts
-
Is it me or are beginner-friendly ETL pipeline guides that explain from the ground-up how to incorporate the use of various technologies notoriously difficult to find.
-
Starting A Data Engineering Project Series
-
Can You Recommend Good Data Engineering Projects
-
Migrate mongodb Datawarehouse to snowflake
-
Preventing replication slot overflow on Postgres DB (AWS RDS)