dataproc-templates
bigquery-utils
| dataproc-templates | bigquery-utils | |
|---|---|---|
| 1 | 19 | |
| 153 | 1,298 | |
| - | 0.6% | |
| 6.9 | 6.6 | |
| about 1 month ago | 8 days ago | |
| Python | Jupyter Notebook | |
| Apache License 2.0 | Apache License 2.0 |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
dataproc-templates
bigquery-utils
-
Ruby on Rails Performance: 7 Lessons from Scaling FirstPromoter
We migrated the analytics layer to Google BigQuery. Same queries that timed out in PostgreSQL now run in under 2 seconds. But not everything belongs in BigQuery — we initially moved too aggressively and actually reverted some queries back when the added complexity wasn't justified. Our rule of thumb: if a query scans hundreds of thousands of rows or involves complex time-series aggregations, BigQuery. Everything else stays in PostgreSQL.
-
How to Analyze 47 Million Hacker News Posts: A Data Scientist's Dream Dataset Just Got Better
Google BigQuery - For large-scale data processing and SQL-based analysis
-
What if ML pipelines had a lock file?
Data Pipelines usually read from tables that change over time. Most of these tables are stored in a data warehouse like Amazon Redshift or Google BigQuery. Rows are added or removed. Backfills happen. A column gets renamed or its meaning changes. Even when teams snapshot data, those snapshots are often implicit, not recorded as part of the pipeline run itself.
-
Best SQL Courses with Certificates for 2026
SQL endures because it's the non-negotiable interface for relational data. Enterprise data storage still relies heavily on relational databases despite new alternatives. What makes SQL valuable for learners is transferability—while dialects differ across PostgreSQL, SQL Server, and BigQuery, the fundamentals stay consistent.
-
Why Your Snowflake Bill is High and How to Fix It with a Hybrid Approach
Within classic cloud data warehouses, Google BigQuery presents a different pricing model. Its on-demand, per-terabyte-scanned pricing can be cost-effective for sporadic forensic queries. But it carries the risk of a runaway query where a single mistake leads to a massive bill.
-
PostgreSQL Maximalism
Alternatives to: DuckDB, Apache Cassandra, Amazon RedShift, Google BigQuery, Snowflake, InfluxDB, Prometheus, Amazon Timestream
-
Every Database Will Support Iceberg — Here's Why
This isn’t hypothetical. It’s already happening. Snowflake supports reading and writing Iceberg. Databricks added Iceberg interoperability via Unity Catalog. Redshift and BigQuery are working toward it.
-
RisingWave Turns Four: Our Journey Beyond Democratizing Stream Processing
Many of these companies first tried achieving real-time results with batch systems like Snowflake or BigQuery. But they quickly found that even five-minute batch intervals weren't fast enough for today's event-driven needs. They turn to RisingWave for its simplicity, low operational burden, and easy integration with their existing PostgreSQL-based infrastructure.
-
How to Pitch Your Boss to Adopt Apache Iceberg?
If your team is managing large volumes of historical data using platforms like Snowflake, Amazon Redshift, or Google BigQuery, you’ve probably noticed a shift happening in the data engineering world. A new generation of data infrastructure is forming — one that prioritizes openness, interoperability, and cost-efficiency. At the center of that shift is Apache Iceberg.
-
Study Notes 2.2.7: Managing Schedules and Backfills with BigQuery in Kestra
BigQuery Documentation: Google Cloud BigQuery
What are some alternatives?
weather_data_pipeline - This is a PySpark-based data pipeline that fetches weather data for a few cities, performs some basic processing and transformation on the data, and then writes the processed data to a Google Cloud Storage bucket and a BigQuery table.The data is then viewed in a looker dashboard
trino-pubsub-event-listener - Trino Google Pub/Sub event listener
pubsub2inbox - Pubsub2Inbox is a versatile, multi-purpose tool to handle Pub/Sub messages and turn them into email, API calls, GCS objects, files or almost anything.
packt-book-bot - Bot that tweets and logs the Packt free eBook of the day in BigQuery daily. [GET https://api.github.com/repos/mattwelke/packt-book-bot: 404 - Not Found // See: https://docs.github.com/rest/repos/repos#get-a-repository]
gcp-flowlogs-reader - Command line tool and Python library for working with Google Cloud VPC Flow Logs
solr - Apache Solr open-source search software