PostgreSQL: No More Vacuum, No More Bloat

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com

Our great sponsors
  • InfluxDB - Power Real-Time Data Analytics at Scale
  • WorkOS - The modern identity platform for B2B SaaS
  • SaaSHub - Software Alternatives and Reviews
  • orioledb

    OrioleDB – building a modern cloud-native storage engine (... and solving some PostgreSQL wicked problems) Β πŸ‡ΊπŸ‡¦

  • https://github.com/orioledb/orioledb/blob/main/doc/arch.md

    > - PostgreSQL is very conservative (maybe extremely) conservative about data safety (mostly achieved via fsync-ing at the right times), and that propagates through the IO stack, including SSD firmware, to cause slowdowns

    This is why our first goal is to become pure extension. Becoming part of PostgreSQL would require test of time.

    > - MVCC is very nice for concurrent access - the Oriole doc doesn't say with what concurrency are the graphs achieved

    Good catch. I've added information about VM type and concurrency to the blog post.

    > - The title of the Oriole doc and its intro text center about solving VACUUM, which is of course a good goal, but I don't think they show that the "square wave" graphs they achieve for PostgreSQL are really in majority caused by VACUUM. Other benchmarks, like Percona's (https://www.percona.com/blog/evaluating-checkpointing-in-pos...) don't yield this very distinctive square wave pattern.

    Yes, it's true. The square patters is because of checkpointing. The reason of improvements here is actually not VACUUM, but modification of relevant indexes only (and row-level WAL, which decreases overall IO).

  • Tile38

    Real-time Geospatial and Geofencing

  • Experimental format to help readability of a long rant:

    1.

    According to the OP, there's a "terrifying tale of VACUUM in PostgreSQL," dating back to "a historical artifact that traces its roots back to the Berkeley Postgres project." (1986?)

    2.

    Maybe the whole idea of "use X, it has been battle-tested for [TIME], is robust, all the bugs have been and keep being fixed," etc., should not really be that attractive or realistic for at least a large subset of projects.

    3.

    In the case of Postgres, on top of piles of "historic code" and cruft, there's the fact that each user of Postgres installs and runs a huge software artifact with hundreds or even thousands of features and dependencies, of which every particular user may only use a tiny subset.

    4.

    In Kleppmann's DDOA [1], after explaining why the declarative SQL language is "better," he writes: "in databases, declarative query languages like SQL turned out to be much better than imperative query APIs." I find this footnote to the paragraph a bit ironic: "IMS and CODASYL both used imperative query APIs. Applications typically used COBOL code to iterate over records in the database, one record at a time." So, SQL was better than CODASYL and COBOL in a number of ways... big surprise?

    Postgres' own PL/pgSQL [2] is a language that (I imagine) most people would rather NOT use: hence a bunch of alternatives, including PL/v8, on its own a huge mass of additional complexity. SQL is definitely "COBOLESQUE" itself.

    5.

    Could we come up with something more minimal than SQL and looking less like COBOL? (Hopefully also getting rid of ORMs in the process). Also, I have found inspiring to see some people creating databases for themselves. Perhaps not a bad idea for small applications? For instance, I found BuntDB [3], which the developer seems to be using to run his own business [4]. Also, HYTRADBOI? :-) [5].

    6.

    A usual objection to use anything other than a stablished relational DB is "creating a database is too difficult for the average programmer." How about debugging PostgreSQL issues, developing new storage engines for it, or even building expertise on how to set up the instances properly and keep it alive and performant? Is that easier?

    I personally feel more capable of implementing a small, well-tested, problem-specific, small implementation of a B-Tree than learning how to develop Postgres extensions, become an expert in its configuration and internals, or debug its many issues.

    Another common opinion is "SQL is easy to use for non-programmers." But every person that knows SQL had to learn it somehow. I'm 100% confident that anyone able to learn SQL should be able to learn a simple, domain-specific, programming language designed for querying DBs. And how many of these people that are not able to program imperatively would be able to read a SQL EXPLAIN output and fix deficient queries? If they can, that supports even more the idea that they should be able to learn something different than SQL.

    ----

    1: https://dataintensive.net/

    2: https://www.postgresql.org/docs/7.3/plpgsql-examples.html

    3: https://github.com/tidwall/buntdb

    4: https://tile38.com/

    5: https://www.hytradboi.com/

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • buntdb

    BuntDB is an embeddable, in-memory key/value database for Go with custom indexing and geospatial support

  • Experimental format to help readability of a long rant:

    1.

    According to the OP, there's a "terrifying tale of VACUUM in PostgreSQL," dating back to "a historical artifact that traces its roots back to the Berkeley Postgres project." (1986?)

    2.

    Maybe the whole idea of "use X, it has been battle-tested for [TIME], is robust, all the bugs have been and keep being fixed," etc., should not really be that attractive or realistic for at least a large subset of projects.

    3.

    In the case of Postgres, on top of piles of "historic code" and cruft, there's the fact that each user of Postgres installs and runs a huge software artifact with hundreds or even thousands of features and dependencies, of which every particular user may only use a tiny subset.

    4.

    In Kleppmann's DDOA [1], after explaining why the declarative SQL language is "better," he writes: "in databases, declarative query languages like SQL turned out to be much better than imperative query APIs." I find this footnote to the paragraph a bit ironic: "IMS and CODASYL both used imperative query APIs. Applications typically used COBOL code to iterate over records in the database, one record at a time." So, SQL was better than CODASYL and COBOL in a number of ways... big surprise?

    Postgres' own PL/pgSQL [2] is a language that (I imagine) most people would rather NOT use: hence a bunch of alternatives, including PL/v8, on its own a huge mass of additional complexity. SQL is definitely "COBOLESQUE" itself.

    5.

    Could we come up with something more minimal than SQL and looking less like COBOL? (Hopefully also getting rid of ORMs in the process). Also, I have found inspiring to see some people creating databases for themselves. Perhaps not a bad idea for small applications? For instance, I found BuntDB [3], which the developer seems to be using to run his own business [4]. Also, HYTRADBOI? :-) [5].

    6.

    A usual objection to use anything other than a stablished relational DB is "creating a database is too difficult for the average programmer." How about debugging PostgreSQL issues, developing new storage engines for it, or even building expertise on how to set up the instances properly and keep it alive and performant? Is that easier?

    I personally feel more capable of implementing a small, well-tested, problem-specific, small implementation of a B-Tree than learning how to develop Postgres extensions, become an expert in its configuration and internals, or debug its many issues.

    Another common opinion is "SQL is easy to use for non-programmers." But every person that knows SQL had to learn it somehow. I'm 100% confident that anyone able to learn SQL should be able to learn a simple, domain-specific, programming language designed for querying DBs. And how many of these people that are not able to program imperatively would be able to read a SQL EXPLAIN output and fix deficient queries? If they can, that supports even more the idea that they should be able to learn something different than SQL.

    ----

    1: https://dataintensive.net/

    2: https://www.postgresql.org/docs/7.3/plpgsql-examples.html

    3: https://github.com/tidwall/buntdb

    4: https://tile38.com/

    5: https://www.hytradboi.com/

  • postgres

    PostgreSQL with extensibility and performance patches (by orioledb)

  • > .. and the delta code to be committed upstream is less than 2K LOC.

    Then mainline it?

    The last commit that /postgres/postgres and /orioledb/postgres share (1) is 6 months old for 15.2?

    I mean, this is literally what I'm pointing out in my comment; you're chasing a moving target. Every change postgres makes, you have to merge in check it doesn't conflict, roll out a new patch set...

    ...and, you're already falling behind on it.

    So, going forward, how do you plan to keep up to date...? ...because it looks to me, like a < 2K LOC isn't a fix for this problem; it hasn't solved it for you, as time goes forwards, it will continue not to be a solution to the problem.

    The solution is mainlining those changes --> https://github.com/orioledb/postgres/issues/2

    > Currently the changes needed to Postgres core are less than a 1000 lines of code. Due to the separate development schedules for Postgres and OrioleDB, these changes cannot be unstreamed in time for v.15.

    ^ They were < 1000 lines a year ago, apparently.

    I think you can draw a clear pattern in how this going to go, going forward.

    [1] - https://github.com/postgres/postgres/commit/78ec02d612a9b690... vs https://github.com/orioledb/postgres/commit/78ec02d612a9b690...

  • PostgreSQL

    Mirror of the official PostgreSQL GIT repository. Note that this is just a *mirror* - we don't work with pull requests on github. To contribute, please see https://wiki.postgresql.org/wiki/Submitting_a_Patch

  • > .. and the delta code to be committed upstream is less than 2K LOC.

    Then mainline it?

    The last commit that /postgres/postgres and /orioledb/postgres share (1) is 6 months old for 15.2?

    I mean, this is literally what I'm pointing out in my comment; you're chasing a moving target. Every change postgres makes, you have to merge in check it doesn't conflict, roll out a new patch set...

    ...and, you're already falling behind on it.

    So, going forward, how do you plan to keep up to date...? ...because it looks to me, like a < 2K LOC isn't a fix for this problem; it hasn't solved it for you, as time goes forwards, it will continue not to be a solution to the problem.

    The solution is mainlining those changes --> https://github.com/orioledb/postgres/issues/2

    > Currently the changes needed to Postgres core are less than a 1000 lines of code. Due to the separate development schedules for Postgres and OrioleDB, these changes cannot be unstreamed in time for v.15.

    ^ They were < 1000 lines a year ago, apparently.

    I think you can draw a clear pattern in how this going to go, going forward.

    [1] - https://github.com/postgres/postgres/commit/78ec02d612a9b690... vs https://github.com/orioledb/postgres/commit/78ec02d612a9b690...

  • WorkOS

    The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.

    WorkOS logo
NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts