[Need feedback] I wrote a guide about the fundamentals of BigQuery for software developers & traditional database users

This page summarizes the projects mentioned and recommended in the original post on /r/dataengineering

Our great sponsors
  • InfluxDB - Power Real-Time Data Analytics at Scale
  • WorkOS - The modern identity platform for B2B SaaS
  • SaaSHub - Software Alternatives and Reviews
  • dataflow

  • I find these things more useful than clicking around the UI: https://github.com/HexcloudCo/dataflow/blob/main/sinks/gcp/gcp-bigquery-table.sh

  • debezium

    Change data capture for a variety of databases. Please log issues at https://issues.redhat.com/browse/DBZ.

  • You don't want to couple your analytics database with your app. The only time this makes sense is when you're building small projects. When you have very high traffic, this method will break. Just stick to CDC. Look into tools like debezium if your team is concerned with sending raw data to the cloud.

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • dbt-external-tables

    dbt macros to stage external sources

  • You can setup your CDC process such that it will create and evolve the tables for you, e.g. by using jdbc connector. So the moment your OLTP database schema changes, it gets picked up by the CDC process and it will propagate it to your OLAP database. If you want to have more control over schema evolution, you can also do it in dbt using dbt-external-tables package.

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts