-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
-
debezium
Change data capture for a variety of databases. Please log issues at https://issues.redhat.com/browse/DBZ.
-
nodejs-bigquery
Node.js client for Google Cloud BigQuery: A fast, economical and fully-managed enterprise data warehouse for large-scale data analytics.
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
But is there a way to solve the schema problem? The answer is, yes, through KCBQ.
Therefore, I still recommend using a streaming framework such as Apache Flink or Apache Kafka Streams.
We use Debezium to capture changes to each database and send the streams to Kafka, and later KCBQ subscribes to the Kafka streams and archives them to BigQuery.
In addition, batch tasks require knowledge of the data schema of each service in order to get the data correctly and save it to the corresponding warehouse table. Assuming our data warehouse is GCP BigQuery, the schema in the warehouse table also needs to be created and modified manually.
For batch processing, I recommend using Apache Airflow, which is easy to manage and easy to script for various DAGs to address the needs of multiple batch processing scenarios.
Related posts
-
Migrating to Snowflake, Redshift, or BigQuery? Use Datafold to Avoid these Common Pitfalls
-
Data Analytics at Potloc I: Making data integrity your priority with Elementary & Meltano
-
You can't leak users' data if you don't hold it
-
NodeJS Security Best Practices
-
Building in Public: Leveraging Tublian's AI Copilot for My Open Source Contributions