Debugging random slow writes in PostgreSQL

InfluxDB - Power Real-Time Data Analytics at Scale

Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

www.influxdata.com

featured

SaaSHub - Software Alternatives and Reviews

SaaSHub helps you find the best software and product alternatives

www.saashub.com

featured

rum

11 694 4.0 C

RUM access method - inverted index with additional information in posting lists (by postgrespro)

We have been bitten by the same behavior. I gave a talk with a friend about this exact topic (diagnosing GIN pending list updates) at PGCon 2019 in Ottawa[1][2].
What you need to know is that the pending list will be merged with the main b-tree during several operations. Only one of them is so extremely critical for your insert performance - that is during actual insert. Both vacuum and autovacuum (including autovacuum analyze but not direct analyze) will merge the pending list. So frequent autovacuums are the first thing you should tune. Merging on insert happens when you exceed the gin_pending_list_limit. In all cases it is also interesting to know which memory parameter is used to rebuild the index as that inpacts how long it will take: work_mem (when triggered on insert), autovacuum_work_mem (when triggered during autovauum) and maintainance_work_mem (triggered by a call to gin_clean_pending_list()) define how much memory can be used for the rebuild.
What you can do is:
- tune the size of the pending list (like you did)
- make sure vacuum runs frequently
- if you have a bulk insert heavy workload (ie. nightly imports), drop the index and create it after inserting rows (not always makes sense business wise, depends on your app)
- disable fastupdate, you pay a higher cost per insert but remove the fluctuctuation when the merge needs to happen
The first thing was done in the article. However I believe the author still relies on the list being merged on insert. If vacuums were tuned agressively along with the limit (vacuums can be tuned per table). Then the list would be merged out of bound of ongoing inserts.
I also had the pleasure of speaking with one main authors of GIN indexes (Oleg Bartunov) during the mentioned PGCon. He gave probably the best solution and informed me to "just use RUM indexes". RUM[3] indexes are like GIN indexes, without the pending list and with faster ranking, faster phrase searches and faster timestamp based ordering. It is however out of the main postgresql release so it might be hard to get it running if you don't control the extensions that are loaded to your Postgres instance.
[1] - wideo https://www.youtube.com/watch?v=Brt41xnMZqo&t=1s
[2] - slides https://www.pgcon.org/2019/schedule/attachments/541_Let's%20...
[3] - https://github.com/postgrespro/rum

InfluxDB

www.influxdata.com featured

Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Postgrespro/rum: RUM access method – inverted index with additional information

1 project | news.ycombinator.com | 17 Dec 2021
Ask HN: Books about full text search

3 projects | news.ycombinator.com | 24 Nov 2022
Postgres Full-Text Search: A Search Engine in a Database

3 projects | news.ycombinator.com | 11 Jul 2022
Is it worth using Postgres' builtin full-text search or should I go straight to Elastic?

2 projects | /r/PostgreSQL | 25 Apr 2023
Show HN: Full text search Project Gutenberg (60m paragraphs)

5 projects | news.ycombinator.com | 24 Jan 2021

Debugging random slow writes in PostgreSQL

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com
Postgresql Index access-method fulltext-search
Post date: 15 May 2021

rum

InfluxDB

Related posts

Postgrespro/rum: RUM access method – inverted index with additional information

Ask HN: Books about full text search

Postgres Full-Text Search: A Search Engine in a Database

Is it worth using Postgres' builtin full-text search or should I go straight to Elastic?

Show HN: Full text search Project Gutenberg (60m paragraphs)

Debugging random slow writes in PostgreSQL

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com Postgresql Index access-method fulltext-search Post date: 15 May 2021

rum

InfluxDB

Related posts

Postgrespro/rum: RUM access method – inverted index with additional information

Ask HN: Books about full text search

Postgres Full-Text Search: A Search Engine in a Database

Is it worth using Postgres' builtin full-text search or should I go straight to Elastic?

Show HN: Full text search Project Gutenberg (60m paragraphs)

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com
Postgresql Index access-method fulltext-search
Post date: 15 May 2021