Postgres Full-Text Search: A Search Engine in a Database

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com

Our great sponsors
  • WorkOS - The modern identity platform for B2B SaaS
  • InfluxDB - Power Real-Time Data Analytics at Scale
  • SaaSHub - Software Alternatives and Reviews
  • zombodb

    Making Postgres and Elasticsearch work together like it's 2023

    Zombo does at least promise to handle "complex reindexing processes" for you (which IME can be very painful) but yeah, I assume you'd still have to deal with shard rebalancing, hardware issues, network failures or latency between postgres and elastic, etc etc.

    The performance and cost implications of Zombo are more salient tradeoffs in my mind – if you want to index one of the main tables in your app, you'll have to wait for a network roundtrip and a multi-node write consensus on every update (~150ms or more[0]), you can't `CREATE INDEX CONCURRENTLY`, etc.

    All that said, IMO the fact that Zombo exists makes it easier to pitch "hey lets just build search with postgres for now and if we ever need ES's features, we can easily port it to Zombo without rearchitecting our product".

    [0] https://github.com/zombodb/zombodb/issues/640

  • rum

    RUM access method - inverted index with additional information in posting lists (by postgrespro)

    My experience has been that sorting by relevance ranking is quite expensive. I looked into this a bit and found https://github.com/postgrespro/rum (and some earlier slide decks about it) that explains why the GIN index type can't support searching and ranking itself (meaning you need to do heap scans for ranking). This is especially problematic if your users routinely do searches that match a lot of documents and you only want to show the top X results.

  • WorkOS

    The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.

  • simonwillisonblog

    The source code behind my blog

    Exact phrase searching works in PostgreSQL full-text search - here's an example: https://simonwillison.net/search/?q=%22nosql+database%22

    I'm using search_type=websearch https://github.com/simonw/simonwillisonblog/blob/a5b53a24b00...

    That's using websearch_to_tsquery() which was added in PostgreSQL 11: https://www.postgresql.org/docs/11/textsearch-controls.html#...

  • pgvector

    Open-source vector similarity search for Postgres

    If you are looking to do semantic search (Cosine similarity) + filtering (SQL) on vectors (audio, text, video, basically anything that can be converted into a vector) I suggest, https://github.com/ankane/pgvector

  • pg_cjk_parser

    Postgres CJK Parser pg_cjk_parser is a fts (full text search) parser derived from the default parser in PostgreSQL 11. When a postgres database uses utf-8 encoding, this parser supports all the features of the default parser while splitting CJK (Chinese, Japanese, Korean) characters into 2-gram tokens. If the database's encoding is not utf-8, the parser behaves just like the default parser.

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

  • parsemail

    Hanami fork of https://github.com/DusanKasan/parsemail

    I used Postgres full-text search for mail log feature on my email forward app https://hanami.run

    Essentially allow arbitraty query in from/to/subject/body. One thing that make full-text serch work great for me is that I don't need to sort or rank the relevant of query. I just show a list of email that match the query order by their id.

    I also don't do pagination and counting, instead users has to load more paged and the ID of the email is pass to the query as a point to compare( where id < requests.get.before).

    And with those strategy, full text search works great for us since we don't really want to bring in ElasticSearch because only about 20% of users use this features.

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts