Our great sponsors
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
-
simpledb
A simple database built from scratch that has some the basic RDBMS features (SQL query parser, transactions, query optimizer)
-
prql
PRQL is a modern language for transforming data — a simple, powerful, pipelined SQL replacement
-
WorkOS
The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
There is also BusTub from CMU which I stumbled upon earlier today:
https://github.com/cmu-db/bustub
the official github for this course - https://github.com/MIT-DB-Class/simple-db-hw-2021 - specifically asks developers not to make their implementations public.
OP I am curious about the project license. Since it is built on top of the assignment code, can you use your own copyright? - https://github.com/awelm/simpledb/blob/2e78bb2/LICENSE
Great questions! I'm not a database expert either but I can try answering these:
1) I think databases like to manage pages directly because the db can make more optimizations than the OS because the db has more context. For example, when aborting a transaction the db knows its dirty pages should be evicted (i'm not sure if mmap offers custom eviction). Also I believe if the db uses mmap, it loses control over when pages are flushed to disk. Flush control is necessary for guaranteeing transaction durability.
2) What you're describing here sounds similar to a LSM-tree database (e.g. RocksDB). They are used often for write-heavy workloads because writes are just appends, but they might not be great for read-heavy things.
3) This reminds me of PRQL[1] (which was trending on Hacker News last week) and Spark SQL. I'm not too familiar with this area though, so I can't really say why SQL was designed this way.
[1] https://github.com/max-sixty/prql?utm_source=hackernewslette...