PostgreSQL: No More Vacuum, No More Bloat

Our great sponsors

InfluxDB - Power Real-Time Data Analytics at Scale

WorkOS - The modern identity platform for B2B SaaS

SaaSHub - Software Alternatives and Reviews

Our great sponsors

orioledb

25 2,631 9.3 C

OrioleDB – building a modern cloud-native storage engine (... and solving some PostgreSQL wicked problems) 🇺🇦

https://github.com/orioledb/orioledb/blob/main/doc/arch.md
> - PostgreSQL is very conservative (maybe extremely) conservative about data safety (mostly achieved via fsync-ing at the right times), and that propagates through the IO stack, including SSD firmware, to cause slowdowns
This is why our first goal is to become pure extension. Becoming part of PostgreSQL would require test of time.
> - MVCC is very nice for concurrent access - the Oriole doc doesn't say with what concurrency are the graphs achieved
Good catch. I've added information about VM type and concurrency to the blog post.
> - The title of the Oriole doc and its intro text center about solving VACUUM, which is of course a good goal, but I don't think they show that the "square wave" graphs they achieve for PostgreSQL are really in majority caused by VACUUM. Other benchmarks, like Percona's (https://www.percona.com/blog/evaluating-checkpointing-in-pos...) don't yield this very distinctive square wave pattern.
Yes, it's true. The square patters is because of checkpointing. The reason of improvements here is actually not VACUUM, but modification of relevant indexes only (and row-level WAL, which decreases overall IO).

Tile38

9 8,902 7.0 Go

Real-time Geospatial and Geofencing

Experimental format to help readability of a long rant:
1.
According to the OP, there's a "terrifying tale of VACUUM in PostgreSQL," dating back to "a historical artifact that traces its roots back to the Berkeley Postgres project." (1986?)
2.
Maybe the whole idea of "use X, it has been battle-tested for [TIME], is robust, all the bugs have been and keep being fixed," etc., should not really be that attractive or realistic for at least a large subset of projects.
3.
In the case of Postgres, on top of piles of "historic code" and cruft, there's the fact that each user of Postgres installs and runs a huge software artifact with hundreds or even thousands of features and dependencies, of which every particular user may only use a tiny subset.
4.
In Kleppmann's DDOA [1], after explaining why the declarative SQL language is "better," he writes: "in databases, declarative query languages like SQL turned out to be much better than imperative query APIs." I find this footnote to the paragraph a bit ironic: "IMS and CODASYL both used imperative query APIs. Applications typically used COBOL code to iterate over records in the database, one record at a time." So, SQL was better than CODASYL and COBOL in a number of ways... big surprise?
Postgres' own PL/pgSQL [2] is a language that (I imagine) most people would rather NOT use: hence a bunch of alternatives, including PL/v8, on its own a huge mass of additional complexity. SQL is definitely "COBOLESQUE" itself.
5.
Could we come up with something more minimal than SQL and looking less like COBOL? (Hopefully also getting rid of ORMs in the process). Also, I have found inspiring to see some people creating databases for themselves. Perhaps not a bad idea for small applications? For instance, I found BuntDB [3], which the developer seems to be using to run his own business [4]. Also, HYTRADBOI? :-) [5].
6.
A usual objection to use anything other than a stablished relational DB is "creating a database is too difficult for the average programmer." How about debugging PostgreSQL issues, developing new storage engines for it, or even building expertise on how to set up the instances properly and keep it alive and performant? Is that easier?
I personally feel more capable of implementing a small, well-tested, problem-specific, small implementation of a B-Tree than learning how to develop Postgres extensions, become an expert in its configuration and internals, or debug its many issues.
Another common opinion is "SQL is easy to use for non-programmers." But every person that knows SQL had to learn it somehow. I'm 100% confident that anyone able to learn SQL should be able to learn a simple, domain-specific, programming language designed for querying DBs. And how many of these people that are not able to program imperatively would be able to read a SQL EXPLAIN output and fix deficient queries? If they can, that supports even more the idea that they should be able to learn something different than SQL.
----
1: https://dataintensive.net/
2: https://www.postgresql.org/docs/7.3/plpgsql-examples.html
3: https://github.com/tidwall/buntdb
4: https://tile38.com/
5: https://www.hytradboi.com/

InfluxDB

www.influxdata.com sponsored

Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
buntdb

7 4,381 0.0 Go

BuntDB is an embeddable, in-memory key/value database for Go with custom indexing and geospatial support

Experimental format to help readability of a long rant:
1.
According to the OP, there's a "terrifying tale of VACUUM in PostgreSQL," dating back to "a historical artifact that traces its roots back to the Berkeley Postgres project." (1986?)
2.
Maybe the whole idea of "use X, it has been battle-tested for [TIME], is robust, all the bugs have been and keep being fixed," etc., should not really be that attractive or realistic for at least a large subset of projects.
3.
In the case of Postgres, on top of piles of "historic code" and cruft, there's the fact that each user of Postgres installs and runs a huge software artifact with hundreds or even thousands of features and dependencies, of which every particular user may only use a tiny subset.
4.
In Kleppmann's DDOA [1], after explaining why the declarative SQL language is "better," he writes: "in databases, declarative query languages like SQL turned out to be much better than imperative query APIs." I find this footnote to the paragraph a bit ironic: "IMS and CODASYL both used imperative query APIs. Applications typically used COBOL code to iterate over records in the database, one record at a time." So, SQL was better than CODASYL and COBOL in a number of ways... big surprise?
Postgres' own PL/pgSQL [2] is a language that (I imagine) most people would rather NOT use: hence a bunch of alternatives, including PL/v8, on its own a huge mass of additional complexity. SQL is definitely "COBOLESQUE" itself.
5.
Could we come up with something more minimal than SQL and looking less like COBOL? (Hopefully also getting rid of ORMs in the process). Also, I have found inspiring to see some people creating databases for themselves. Perhaps not a bad idea for small applications? For instance, I found BuntDB [3], which the developer seems to be using to run his own business [4]. Also, HYTRADBOI? :-) [5].
6.
A usual objection to use anything other than a stablished relational DB is "creating a database is too difficult for the average programmer." How about debugging PostgreSQL issues, developing new storage engines for it, or even building expertise on how to set up the instances properly and keep it alive and performant? Is that easier?
I personally feel more capable of implementing a small, well-tested, problem-specific, small implementation of a B-Tree than learning how to develop Postgres extensions, become an expert in its configuration and internals, or debug its many issues.
Another common opinion is "SQL is easy to use for non-programmers." But every person that knows SQL had to learn it somehow. I'm 100% confident that anyone able to learn SQL should be able to learn a simple, domain-specific, programming language designed for querying DBs. And how many of these people that are not able to program imperatively would be able to read a SQL EXPLAIN output and fix deficient queries? If they can, that supports even more the idea that they should be able to learn something different than SQL.
----
1: https://dataintensive.net/
2: https://www.postgresql.org/docs/7.3/plpgsql-examples.html
3: https://github.com/tidwall/buntdb
4: https://tile38.com/
5: https://www.hytradboi.com/

postgres

4 23 8.1 C

PostgreSQL with extensibility and performance patches (by orioledb)

> .. and the delta code to be committed upstream is less than 2K LOC.
Then mainline it?
The last commit that /postgres/postgres and /orioledb/postgres share (1) is 6 months old for 15.2?
I mean, this is literally what I'm pointing out in my comment; you're chasing a moving target. Every change postgres makes, you have to merge in check it doesn't conflict, roll out a new patch set...
...and, you're already falling behind on it.
So, going forward, how do you plan to keep up to date...? ...because it looks to me, like a < 2K LOC isn't a fix for this problem; it hasn't solved it for you, as time goes forwards, it will continue not to be a solution to the problem.
The solution is mainlining those changes --> https://github.com/orioledb/postgres/issues/2
> Currently the changes needed to Postgres core are less than a 1000 lines of code. Due to the separate development schedules for Postgres and OrioleDB, these changes cannot be unstreamed in time for v.15.
^ They were < 1000 lines a year ago, apparently.
I think you can draw a clear pattern in how this going to go, going forward.
[1] - https://github.com/postgres/postgres/commit/78ec02d612a9b690... vs https://github.com/orioledb/postgres/commit/78ec02d612a9b690...

PostgreSQL

405 14,673 10.0 C

Mirror of the official PostgreSQL GIT repository. Note that this is just a *mirror* - we don't work with pull requests on github. To contribute, please see https://wiki.postgresql.org/wiki/Submitting_a_Patch

> .. and the delta code to be committed upstream is less than 2K LOC.
Then mainline it?
The last commit that /postgres/postgres and /orioledb/postgres share (1) is 6 months old for 15.2?
I mean, this is literally what I'm pointing out in my comment; you're chasing a moving target. Every change postgres makes, you have to merge in check it doesn't conflict, roll out a new patch set...
...and, you're already falling behind on it.
So, going forward, how do you plan to keep up to date...? ...because it looks to me, like a < 2K LOC isn't a fix for this problem; it hasn't solved it for you, as time goes forwards, it will continue not to be a solution to the problem.
The solution is mainlining those changes --> https://github.com/orioledb/postgres/issues/2
> Currently the changes needed to Postgres core are less than a 1000 lines of code. Due to the separate development schedules for Postgres and OrioleDB, these changes cannot be unstreamed in time for v.15.
^ They were < 1000 lines a year ago, apparently.
I think you can draw a clear pattern in how this going to go, going forward.
[1] - https://github.com/postgres/postgres/commit/78ec02d612a9b690... vs https://github.com/orioledb/postgres/commit/78ec02d612a9b690...

WorkOS

workos.com sponsored

The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Is PostgreSQL a worthy alternative to Microsoft SQL Server?
3 projects | dev.to | 7 Feb 2023
Data-diff v0.3: DuckDB, efficient in-database diffing and more
1 project | news.ycombinator.com | 15 Dec 2022
data-diff VS cuallee - a user suggested alternative
2 projects | 30 Nov 2022
pgcat: a PostgreSQL pooler
6 projects | dev.to | 14 Nov 2022
Compare identical tables across databases to identify data differences (Oracle 19c)
1 project | /r/SQL | 26 Oct 2022

PostgreSQL: No More Vacuum, No More Bloat

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com
Database Geospatial RDBMS Postgresql in-memory
Post date: 15 Jul 2023

orioledb

Tile38

InfluxDB

buntdb

postgres

PostgreSQL

WorkOS

Related posts

PostgreSQL: No More Vacuum, No More Bloat

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com Database Geospatial RDBMS Postgresql in-memory Post date: 15 Jul 2023

orioledb

Tile38

InfluxDB

buntdb

postgres

PostgreSQL

WorkOS

Related posts

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com
Database Geospatial RDBMS Postgresql in-memory
Post date: 15 Jul 2023