Python Snowflake

Open-source Python projects categorized as Snowflake

Top 23 Python Snowflake Projects

  • airbyte

    The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.

  • Project mention: Launch HN: Bracket (YC W22) – Two-Way Sync Between Salesforce and Postgres | news.ycombinator.com | 2023-12-12

    I'l also give a shout-out to Airbyte (https://airbyte.com/), with which I've had some limited success with integrating Salesforce to a local database. The particular pull for Airbyte is that we can self-host the open source version, rather than pay Fivetran a significant sum to do this for us.

    It's an immature tool, so I don't yet know that I can claim we've spent _less_ than Fivetran on the additional engineering and ops time, but it feels like it has potential to do so once stabilized.

  • sqlglot

    Python SQL Parser and Transpiler

  • Project mention: The Future of MySQL is PostgreSQL: an extension for the MySQL wire protocol | news.ycombinator.com | 2024-04-26

    This is probably referring to "zero changes to your driver code" and not "zero changes to the SQL you send over this driver".

    Translating between SQL dialects is notoriously hard and attempts to translate [1] are working in 95% of cases. But the last 5% would require 5x amount of work. That's because "SQL dialect" also includes weird edge cases of type inference of things like COALESCE(5, FALSE) and emulation of system catalogs (pg_catalog, information_schema).

    [1] https://github.com/tobymao/sqlglot

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • ibis

    the portable Python dataframe library

  • Project mention: Show HN: Hashquery, a Python library for defining reusable analysis | news.ycombinator.com | 2024-04-23

    I really don't understand the appeal of dbt vs a proper programming language. The templating approach leads to massive spaghetti. I look forward to trying out something like Ibis [0]

    0: https://ibis-project.org/

  • data-diff

    Compare tables within or across databases

  • Project mention: How to Check 2 SQL Tables Are the Same | news.ycombinator.com | 2023-07-26

    If the issue happen a lot, there is also: https://github.com/datafold/data-diff

    That is a nice tool to do it cross database as well.

    I think it's based on checksum method.

  • ingestr

    ingestr is a CLI tool to copy data between any databases with a single command seamlessly.

  • Project mention: FLaNK 04 March 2024 | dev.to | 2024-03-04
  • soda-core

    :zap: Data quality testing for the modern data stack (SQL, Spark, and Pandas) https://www.soda.io

  • jupysql

    Better SQL in Jupyter. 📊

  • Project mention: Show HN: JupySQL – a SQL client for Jupyter (ipython-SQL successor) | news.ycombinator.com | 2023-12-06

    Hey, HN community!

    We're stoked to launch JupySQL today! JupySQL is an open-source library that brings a modern SQL experience to Jupyter. JupySQL is compatible with all major databases, such as Snowflake, Redshift, PostgreSQL, MySQL, MariaDB, DuckDB, SQL Server, Clickhouse, Trino, and more!

    To get started, check out our tutorial: https://jupysql.ploomber.io/en/latest/quick-start.html

    SQL is the defacto language for data analysis; however, analysis often requires a mix of SQL and Python. JupySQL bridges this gap, allowing users to execute SQL queries seamlessly in Jupyter and continue their analysis in Python. Add %%sql to the top of your cell and start writing SQL.

    Here are some of JupySQL's main features:

    - Syntax highlighting

  • SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

    SaaSHub logo
  • versatile-data-kit

    One framework to develop, deploy and operate data workflows with Python and SQL.

  • Project mention: Looking for a data blogger | /r/opensource | 2023-05-19

    Here's the project: https://github.com/vmware/versatile-data-kit

  • snowChat

    Chat snowflake database - Text to SQL

  • Project mention: 🥳 Announcing the winners of the Summit Hackathon! | /r/StreamlitOfficial | 2023-05-16

    🏆 1st place goes to snowChat, created by Kaarthik Andavar

  • astro-sdk

    Astro SDK allows rapid and clean development of {Extract, Load, Transform} workflows using Python and SQL, powered by Apache Airflow.

  • Project mention: Orchestration: Thoughts on Dagster, Airflow and Prefect? | /r/dataengineering | 2023-06-01

    Have you tried the Astro SDK? https://github.com/astronomer/astro-sdk

  • grai-core

  • Project mention: Launch HN: Grai (YC S22) – Open-Source Data Observability Platform | news.ycombinator.com | 2023-07-17

    Elastic v2 if one is interested in such things: https://github.com/grai-io/grai-core/blob/v0.1.33/LICENSE

  • snowpark-python

    Snowflake Snowpark Python API

  • dbt-coves

    CLI tool for dbt users to simplify creation of staging models (yml and sql) files

  • Project mention: Is there something wrong with me, I hate dbt, what am I missing ? | /r/dataengineering | 2023-05-15

    This just feels like you aren’t using the plentiful tools to make those “mind-numbingly slow” dev steps faster. For ex., using dbt-coves to generate the staging models with casting to types in a couple clicks. And pulling directly from Fivetran tables is just poor practice, with the additional steps needed to do it “right” being inconsequential at best.

  • CueObserve

    Timeseries Anomaly detection and Root Cause Analysis on data in SQL data warehouses and databases

  • dbt-ml-preprocessing

    A SQL port of python's scikit-learn preprocessing module, provided as cross-database dbt macros.

  • diepvries

    The Picnic Data Vault framework.

  • SnowDDL

    Declarative-style object management tool for Snowflake.

  • prism

    Prism is the easiest way to develop, orchestrate, and execute data pipelines in Python. (by runprism)

  • Project mention: Prism: the easiest way to create robust data workflows. Accessible via CLI | /r/coolgithubprojects | 2023-09-21
  • pgwarehouse

    Easily sync your Postgres database to a Snowflake, ClickHouse, or DuckDB warehouse.

  • dbd

    dbd is a database prototyping tool that enables data analysts and engineers to quickly load and transform data in SQL databases.

  • snowpark-python-template

    Python project template for Snowpark development

  • snowflake-provisioning

    Snowflake Database, Schema, and Warehouse provisioning with Access Roles & Generating and Provisioning of Functional Roles & Snowflake Source Export & Snowflake cloning tool

  • Project mention: How are your Roles setup and administered on your Snowflake instance? | /r/snowflake | 2023-06-04

    I’ve built a tool to manage provisioning of access roles for databases, schemas, and warehouses and functional roles to use the access roles: https://github.com/thomaseibner/snowflake-provisioning

  • snowflake-cli

    A simple python script for hosting a Snowflake Proxy in your python program or with it's standalone cli (by gp2112)

  • SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

    SaaSHub logo
NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020).

Python Snowflake related posts

  • Vanna.ai: Chat with your SQL database

    13 projects | news.ycombinator.com | 14 Jan 2024
  • 🥳 Announcing the winners of the Summit Hackathon!

    1 project | /r/StreamlitOfficial | 16 May 2023
  • Data-diff v0.3: DuckDB, efficient in-database diffing and more

    1 project | news.ycombinator.com | 15 Dec 2022
  • What are you using to manage roles/grants in Snowflake? Question for any Permifrost users

    1 project | /r/snowflake | 14 Dec 2022
  • data-diff VS cuallee - a user suggested alternative

    2 projects | 30 Nov 2022
  • Compare identical tables across databases to identify data differences (Oracle 19c)

    1 project | /r/SQL | 26 Oct 2022
  • How to test Data Ingestion Pipeline

    1 project | /r/dataengineering | 26 Sep 2022
  • A note from our sponsor - InfluxDB
    www.influxdata.com | 10 May 2024
    Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality. Learn more →

Index

What are some of the best open-source Snowflake projects in Python? This list will help you:

Project Stars
1 airbyte 14,112
2 sqlglot 5,573
3 ibis 4,241
4 data-diff 2,862
5 ingestr 2,336
6 soda-core 1,768
7 jupysql 610
8 versatile-data-kit 411
9 snowChat 392
10 astro-sdk 319
11 grai-core 270
12 snowpark-python 231
13 dbt-coves 209
14 CueObserve 205
15 dbt-ml-preprocessing 175
16 diepvries 124
17 SnowDDL 83
18 prism 79
19 pgwarehouse 60
20 dbd 55
21 snowpark-python-template 50
22 snowflake-provisioning 35
23 snowflake-cli 6

Sponsored
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com