Top 23 Python Bigquery Projects

Redash

38 24,917 9.5 Python

Make Your Company Data Driven. Connect to any data source, easily visualize, dashboard and share your data.

Project mention: Redash: Connect to data source, easily visualize, dashboard and share your data | news.ycombinator.com | 2024-03-20

airbyte

139 13,923 10.0 Python

The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.

Project mention: Launch HN: Bracket (YC W22) – Two-Way Sync Between Salesforce and Postgres | news.ycombinator.com | 2023-12-12

I'l also give a shout-out to Airbyte (https://airbyte.com/), with which I've had some limited success with integrating Salesforce to a local database. The particular pull for Airbyte is that we can self-host the open source version, rather than pay Fivetran a significant sum to do this for us.
It's an immature tool, so I don't yet know that I can claim we've spent _less_ than Fivetran on the additional engineering and ops time, but it feels like it has potential to do so once stabilized.

WorkOS

workos.com sponsored

The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
sqlglot

55 5,441 9.9 Python

Python SQL Parser and Transpiler

Project mention: Transpile Any SQL to PostgreSQL Dialect | news.ycombinator.com | 2024-03-18

Recommend checking out https://github.com/tobymao/sqlglot if you are interested in this capability for other SQL dialects
Tools like this are helpful for:
- Rendering SQL in a consistent way, eg for snapshot testing

ibis

22 4,074 10.0 Python

the portable Python dataframe library

Project mention: This Week In Python | dev.to | 2024-03-17

ibis – portable Python dataframe library

ethereum-etl

3 2,819 5.8 Python

Python scripts for ETL (extract, transform and load) jobs for Ethereum blocks, transactions, ERC20 / ERC721 tokens, transfers, receipts, logs, contracts, internal transactions. Data is available in Google BigQuery https://goo.gl/oY5BCQ

Project mention: Blockchain transactions decoding: making wallet activity understandable | dev.to | 2023-10-27

Event is a log entity which EVM smart contracts can emit during transaction execution. Events are very good at signalling that an some action has taken place on-chain. Applications can subscribe and listen to events to trigger some off-chain logic or they can index, transform and store events in some off-chain storage (look at The Graph protocol or Ethereum ETL).

professional-services

8 2,723 9.1 Python

Common solutions and tools developed by Google Cloud's Professional Services team. This repository and its contents are not an officially supported Google product.
ingestr

4 2,308 8.9 Python

ingestr is a CLI tool to copy data between any databases with a single command seamlessly.

Project mention: FLaNK 04 March 2024 | dev.to | 2024-03-04

InfluxDB

www.influxdata.com sponsored

Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
swirl-search

32 1,509 9.9 Python

Swirl is an open-source search platform that uses AI to search multiple content and data sources simultaneously and return AI-ranked results. And provides summaries of your answers from searches using LLMs. It's a one-click, easy-to-use Retrieval Augmented Generation (RAG) Solution.

Project mention: GitHub - swirlai/swirl-search: Swirl is an open-source search platform that uses AI to search multiple content and data sources simultaneously, finds the best results using a reader LLM, then prompts Generative AI, enabling you to get answers based on your data. | /r/programming | 2023-12-05

jupysql

8 598 9.3 Python

Better SQL in Jupyter. 📊

Project mention: Show HN: JupySQL – a SQL client for Jupyter (ipython-SQL successor) | news.ycombinator.com | 2023-12-06

Hey, HN community!
We're stoked to launch JupySQL today! JupySQL is an open-source library that brings a modern SQL experience to Jupyter. JupySQL is compatible with all major databases, such as Snowflake, Redshift, PostgreSQL, MySQL, MariaDB, DuckDB, SQL Server, Clickhouse, Trino, and more!
To get started, check out our tutorial: https://jupysql.ploomber.io/en/latest/quick-start.html
SQL is the defacto language for data analysis; however, analysis often requires a mix of SQL and Python. JupySQL bridges this gap, allowing users to execute SQL queries seamlessly in Jupyter and continue their analysis in Python. Add %%sql to the top of your cell and start writing SQL.
Here are some of JupySQL's main features:
- Syntax highlighting

BigQuery-Python

1 449 1.8 Python

Simple Python client for interacting with Google BigQuery.
python-bigquery-pandas

1 419 8.1 Python

Google BigQuery connector for pandas
pypinfo

1 394 5.4 Python

Easily view PyPI download statistics via Google's BigQuery.
astro-sdk

7 317 8.6 Python

Astro SDK allows rapid and clean development of {Extract, Load, Transform} workflows using Python and SQL, powered by Apache Airflow.

Project mention: Orchestration: Thoughts on Dagster, Airflow and Prefect? | /r/dataengineering | 2023-06-01

Have you tried the Astro SDK? https://github.com/astronomer/astro-sdk

bigquery-schema-generator

1 231 6.3 Python

Generates the BigQuery schema from newline-delimited JSON or CSV data records.
CueObserve

6 205 0.0 Python

Timeseries Anomaly detection and Root Cause Analysis on data in SQL data warehouses and databases
dbt-coves

1 208 9.2 Python

CLI tool for dbt users to simplify creation of staging models (yml and sql) files

Project mention: Is there something wrong with me, I hate dbt, what am I missing ? | /r/dataengineering | 2023-05-15

This just feels like you aren’t using the plentiful tools to make those “mind-numbingly slow” dev steps faster. For ex., using dbt-coves to generate the staging models with casting to types in a couple clicks. And pulling directly from Fivetran tables is just poor practice, with the additional steps needed to do it “right” being inconsequential at best.

dbt-ml-preprocessing

2 175 3.7 Python

A SQL port of python's scikit-learn preprocessing module, provided as cross-database dbt macros.
starthinker

1 166 2.8 Python

Reference framework for building data workflows provided by Google. Accelerates authentication, logging, scheduling, and deployment of solutions using GCP. To borrow a tagline.. "The framework for professionals with deadlines."
premier-league

8 142 9.5 Python

A Data Engineering project. Repository for backend infrastructure and Streamlit app files for a Premier League Dashboard.

Project mention: Google Cloud Portfolio Projects? | /r/googlecloud | 2023-12-09

I have a data engineering project that uses BigQuery, Cloud Run, Compute Engine, Cloud SQL, Artifact Registry, Firestore, and Datastream.

dataproc-templates

1 110 9.0 Python

Dataproc templates and pipelines for solving simple in-cloud data tasks
bigquery_fdw

2 89 3.9 Python

BigQuery Foreign Data Wrapper for PostgreSQL
prism

7 79 8.9 Python

Prism is the easiest way to develop, orchestrate, and execute data pipelines in Python. (by runprism)

Project mention: Prism: the easiest way to create robust data workflows. Accessible via CLI | /r/coolgithubprojects | 2023-09-21

iris3

2 66 8.7 Python

An upgraded and improved version of the Iris automatic GCP-labeling project
SaaSHub

www.saashub.com sponsored

SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020).

Python Bigquery related posts

This Week In Python
5 projects | dev.to | 17 Mar 2024
Show HN: I built an open-source data copy tool called ingestr
3 projects | news.ycombinator.com | 27 Feb 2024
Ingestr: CLI tool to copy data between any databases with a single command
1 project | news.ycombinator.com | 27 Feb 2024
JupySQL: Connecting to a SQL database from Jupyter
1 project | /r/SQL | 9 Sep 2023
GitHub - ploomber/jupysql: Better SQL in Jupyter. 📊
1 project | /r/coolgithubprojects | 6 Sep 2023
SQL CTE's in Jupyter notebooks, DuckDB integration and more
1 project | /r/Jupyter | 2 Aug 2023
TL;DR incorporate SQL functionality within Jupyter, access to modern data processing DBs (like DuckDB), polars and data exploration through plotting easier with JupySQL.
1 project | /r/coolgithubprojects | 2 Aug 2023
A note from our sponsor - WorkOS
workos.com | 23 Apr 2024

The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning. Learn more →

Index

What are some of the best open-source Bigquery projects in Python? This list will help you:

	Project	Stars
1	Redash	24,917
2	airbyte	13,923
3	sqlglot	5,441
4	ibis	4,074
5	ethereum-etl	2,819
6	professional-services	2,723
7	ingestr	2,308
8	swirl-search	1,509
9	jupysql	598
10	BigQuery-Python	449
11	python-bigquery-pandas	419
12	pypinfo	394
13	astro-sdk	317
14	bigquery-schema-generator	231
15	CueObserve	205
16	dbt-coves	208
17	dbt-ml-preprocessing	175
18	starthinker	166
19	premier-league	142
20	dataproc-templates	110
21	bigquery_fdw	89
22	prism	79
23	iris3	66