SaaSHub helps you find the best software and product alternatives Learn more →
Top 23 Snowflake Open-Source Projects
-
airbyte
The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
-
bytebase
The GitLab/GitHub for database DevOps. World's most advanced database DevOps and CI/CD for Developer, DBA and Platform Engineering teams.
-
Ockam
Orchestrate end-to-end encryption, cryptographic identities, mutual authentication, and authorization policies between distributed applications – at massive scale.
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
-
jitsu
Jitsu is an open-source Segment alternative. Fully-scriptable data ingestion engine for modern data teams. Set-up a real-time data pipeline in minutes, not days
-
snowflake
A simple to use Go (golang) package to generate or parse Twitter snowflake IDs (by bwmarrin)
-
soda-core
:zap: Data quality testing for the modern data stack (SQL, Spark, and Pandas) https://www.soda.io
-
elementary
The dbt-native data observability solution for data & analytics engineers. Monitor your data pipelines in minutes. Available as self-hosted or cloud service with premium features.
-
peerdb
Fast, Simple and a cost effective tool to replicate data from Postgres to Data Warehouses, Queues and Storage
-
dozer
Dozer is a real-time data movement tool that leverages CDC from various sources and moves data into various sinks. (by getdozer)
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
Project mention: Launch HN: Bracket (YC W22) – Two-Way Sync Between Salesforce and Postgres | news.ycombinator.com | 2023-12-12I'l also give a shout-out to Airbyte (https://airbyte.com/), with which I've had some limited success with integrating Salesforce to a local database. The particular pull for Airbyte is that we can self-host the open source version, rather than pay Fivetran a significant sum to do this for us.
It's an immature tool, so I don't yet know that I can claim we've spent _less_ than Fivetran on the additional engineering and ops time, but it feels like it has potential to do so once stabilized.
Project mention: Variant in Apache Doris 2.1.0: a new data type 8 times faster than JSON for semi-structured data analysis | dev.to | 2024-03-27As an open-source real-time data warehouse, Apache Doris provides semi-structured data processing capabilities, and the newly-released version 2.1.0 makes a stride in this direction. Before V2.1, Apache Doris stores semi-structured data as JSON files. However, during query execution, the real-time parsing of JSON data leads to high CPU and I/O consumption in addition to high query latency, especially when the dataset is huge and complicated. Moreover, the lack of a pre-defined schema means there is no handle for query optimization.
Project mention: Ask HN: What tool(s) do you use to code review and deploy SQL scripts? | news.ycombinator.com | 2024-04-14We have been building https://github.com/bytebase/bytebase for 3+ years. You can think it of as GitHub/GitLab for SQL changes, with integrated GitOps, code review and deployment.
You can further check out this tutorial to get a feel of our GitOps solution
https://www.bytebase.com/docs/tutorials/database-change-mana...
Project mention: The Future of MySQL is PostgreSQL: an extension for the MySQL wire protocol | news.ycombinator.com | 2024-04-26This is probably referring to "zero changes to your driver code" and not "zero changes to the SQL you send over this driver".
Translating between SQL dialects is notoriously hard and attempts to translate [1] are working in 95% of cases. But the last 5% would require 5x amount of work. That's because "SQL dialect" also includes weird edge cases of type inference of things like COALESCE(5, FALSE) and emulation of system catalogs (pg_catalog, information_schema).
[1] https://github.com/tobymao/sqlglot
Project mention: GrowthBook: Open-source feature flagging and A/B testing platform | /r/opensource | 2023-10-20
disclosure: I work at Ockam.
The Portals for Mac app is an example of the type of thing you could build using the open source stack of protocols. The README (linked by parent) links out to all of the relevant parts of the protocol documentation to explain how these work together. The NAT Traversal (https://github.com/build-trust/ockam/blob/develop/examples/a...) part of the README is probably the best explanation of why the free relay you get via Ockam Orchestrator is a useful part of this demo.
As for why would anyone trust this: The protocols are designed so you absolutely don't have to trust the relay. Trust is pushed out to the edges that you control and so you're not susceptible to a MITM attack if something like a relay is compromised. The protocol design for all of this is open and documented, and was independently audited by (IMO) some of the best in the business, Trail of Bits: https://docs.ockam.io/reference/protocols.
Project mention: Show HN: Hashquery, a Python library for defining reusable analysis | news.ycombinator.com | 2024-04-23I really don't understand the appeal of dbt vs a proper programming language. The templating approach leads to massive spaghetti. I look forward to trying out something like Ibis [0]
0: https://ibis-project.org/
Fluent Migrator
If the issue happen a lot, there is also: https://github.com/datafold/data-diff
That is a nice tool to do it cross database as well.
I think it's based on checksum method.
Project mention: Ask HN: What tool(s) do you use to code review and deploy SQL scripts? | news.ycombinator.com | 2024-04-14We use https://sqitch.org/ and we’re fairly happy with it. Sqitch manages the files to deploy which are applied fits to a local database.
We use GitHub actions for deployment and database migrations are just one step of the pipeline. The step invokes sqitch deploy which runs all the pending migration files.
Then, all the approval process is standard for the environment. We require approvals in pull requests before merging to the main branch.
Project mention: Pgwire: a Rust library for PostgreSQL compatible application | news.ycombinator.com | 2024-03-20We at PeerDB (https://github.com/PeerDB-io/peerdb) were early adopters of Pgwire to implement our Postgres-compatible SQL Layer to do ETL. Very easy to work with. Saved us multiple months of effort to build it from scratch.
Project mention: Show HN: Find simple open source bounties to solve and get paid | news.ycombinator.com | 2023-08-19https://github.com/getdozer/dozer/issues/1631#issuecomment-1...
and then something has gone off the rails about the accounting process since
Trigger.dev
Go team does acknowledge [1] it as a bug, so there is some point here
However, that said, I wonder if OP (duckdb) could have written their solution [2] differently. Shouldn't they be able to select from a Pipe as well as Error channel simultaneously? (similar to how they are doing it inside here [3]). If not, I would have create a go-routine that does blocking read on the Pipe and then pass it on to another channel to select on.
[1] https://github.com/golang/go/issues/66239
[2] https://github.com/scratchdata/scratchdata/blob/7c1a0fcd0e20...
[3] https://github.com/scratchdata/scratchdata/blob/7c1a0fcd0e20...
Snowflake related posts
-
Vanna.ai: Chat with your SQL database
-
Show HN: SQL Polyglot
-
Migrate mongodb Datawarehouse to snowflake
-
The Chan Zuckerberg Initiative Originally Built the Snowflake Terraform Provider
-
Preventing replication slot overflow on Postgres DB (AWS RDS)
-
Preventing WAL Growth on Postgres DB Running on AWS RDS
-
Launch HN: PeerDB (YC S23) – Fast, Native ETL/ELT for Postgres
-
A note from our sponsor - SaaSHub
www.saashub.com | 5 May 2024
Index
What are some of the best open-source Snowflake projects? This list will help you:
Project | Stars | |
---|---|---|
1 | airbyte | 14,112 |
2 | doris | 11,389 |
3 | bytebase | 10,107 |
4 | sqlglot | 5,573 |
5 | growthbook | 5,549 |
6 | Ockam | 4,352 |
7 | ibis | 4,241 |
8 | Rudderstack | 3,940 |
9 | jitsu | 3,861 |
10 | sqlchat | 3,741 |
11 | FluentMigrator | 3,130 |
12 | tbls | 3,074 |
13 | data-diff | 2,847 |
14 | snowflake | 2,847 |
15 | sqitch | 2,708 |
16 | ingestr | 2,331 |
17 | soda-core | 1,765 |
18 | elementary | 1,740 |
19 | peerdb | 1,640 |
20 | dozer | 1,450 |
21 | IdGen | 1,127 |
22 | scratchdata | 1,034 |
23 | yauaa | 730 |
Sponsored