Show HN: ScratchDB – Open-Source Snowflake on ClickHouse

InfluxDB - Power Real-Time Data Analytics at Scale

Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

www.influxdata.com

featured

SaaSHub - Software Alternatives and Reviews

SaaSHub helps you find the best software and product alternatives

www.saashub.com

featured

scratchdata

5 1,034 9.4 Go

Scratch is a swiss army knife for big data.

Hello! For the past year I’ve been working on a fully-managed data warehouse built on Clickhouse. I built this because I was frustrated with how much work was required to run an OLAP database in prod: re-writing my app to do batch inserts, managing clusters and needing to look up special CREATE TABLE syntax every time I made a change. I found pricing for other warehouses confusing (what is a “credit” exactly?) and worried about getting capacity-planning wrong.
I was previously building accounting software for firms with millions of transactions. I desperately needed to move from Postgres to an OLAP database but didn’t know where to start. I eventually built abstractions around Clickhouse: My application code called an insert() function but in the background I had to stand up Kafka for streaming, bulk loading, DB drivers, Clickhouse configs, and manage schema changes.
This was all a big distraction when all I wanted was to save data and get it back. So I decided to build a better developer experience around it.
https://github.com/scratchdata/ScratchDB
I call it “ScratchDB.” It makes it easy to get started from scratch. It’s a massively simpler abstraction on top of the Clickhouse.
Scratch provides two endpoints [1]: one to insert data and another to query. When you send any JSON, it automatically creates tables and columns based on the structure [2]. Because table creation is automated, you can just start sending data and the system will just work [3]. It also means you can use Scratch as any webhook destination without prior setup [4,5]. When you query, just pass SQL as a query param and it returns JSON.
Scratch handles streaming and bulk loading data.
When data is inserted, I append it to a file on disk, which is then bulk load into Clickhouse. The overall goal is for the platform to automatically handle managing shards and replicas.
The whole thing runs on regular servers. Hetzner has become our cloud of choice, along with Backblaze B2 and SQS. It is written in Go. From an architecture perspective I try to keep things simple - want folks to make economical use of their servers.
So far ScratchDB has ingested about 2 TB of data and 4,000 requests/second on about $100 worth of monthly server costs.
Feel free to download it and play around - if you’re interested in this stuff then I’d love to chat! Really looking for feedback on what is hard about analytical databases and what would make the developer experience easier.
[1] https://scratchdb.com/docs

InfluxDB

www.influxdata.com featured

Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Show HN: ScratchDB – Open-Source Snowflake on ClickHouse

1 project | /r/hackernews | 29 Oct 2023
This Week In Python

5 projects | dev.to | 17 Mar 2024
Debugging a Golang Bug with Non-Blocking Reads

2 projects | news.ycombinator.com | 12 Mar 2024
Show HN: SQL Polyglot

4 projects | news.ycombinator.com | 16 Dec 2023
ChDB: Embedded OLAP SQL Engine Powered by ClickHouse

6 projects | news.ycombinator.com | 23 Oct 2023

Show HN: ScratchDB – Open-Source Snowflake on ClickHouse

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com
Clickhouse data-warehouse HacktoberFest Olap Bigquery
Post date: 27 Oct 2023

scratchdata

InfluxDB

Related posts