Postgres Full-Text Search: A Search Engine in a Database

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com

InfluxDB - Power Real-Time Data Analytics at Scale
Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
www.influxdata.com
featured
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com
featured
  • zombodb

    Making Postgres and Elasticsearch work together like it's 2023

  • Zombo does at least promise to handle "complex reindexing processes" for you (which IME can be very painful) but yeah, I assume you'd still have to deal with shard rebalancing, hardware issues, network failures or latency between postgres and elastic, etc etc.

    The performance and cost implications of Zombo are more salient tradeoffs in my mind – if you want to index one of the main tables in your app, you'll have to wait for a network roundtrip and a multi-node write consensus on every update (~150ms or more[0]), you can't `CREATE INDEX CONCURRENTLY`, etc.

    All that said, IMO the fact that Zombo exists makes it easier to pitch "hey lets just build search with postgres for now and if we ever need ES's features, we can easily port it to Zombo without rearchitecting our product".

    [0] https://github.com/zombodb/zombodb/issues/640

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • rum

    RUM access method - inverted index with additional information in posting lists (by postgrespro)

  • My experience has been that sorting by relevance ranking is quite expensive. I looked into this a bit and found https://github.com/postgrespro/rum (and some earlier slide decks about it) that explains why the GIN index type can't support searching and ranking itself (meaning you need to do heap scans for ranking). This is especially problematic if your users routinely do searches that match a lot of documents and you only want to show the top X results.

    If you are using Rails with Postgres you can use pg_search gem to build the named scopes to take advantage of full text search.

    https://github.com/Casecommons/pg_search

  • simonwillisonblog

    The source code behind my blog

  • Exact phrase searching works in PostgreSQL full-text search - here's an example: https://simonwillison.net/search/?q=%22nosql+database%22

    I'm using search_type=websearch https://github.com/simonw/simonwillisonblog/blob/a5b53a24b00...

    That's using websearch_to_tsquery() which was added in PostgreSQL 11: https://www.postgresql.org/docs/11/textsearch-controls.html#...

  • pgvector

    Open-source vector similarity search for Postgres

  • If you are looking to do semantic search (Cosine similarity) + filtering (SQL) on vectors (audio, text, video, basically anything that can be converted into a vector) I suggest, https://github.com/ankane/pgvector

  • pg_cjk_parser

    Postgres CJK Parser pg_cjk_parser is a fts (full text search) parser derived from the default parser in PostgreSQL 11. When a postgres database uses utf-8 encoding, this parser supports all the features of the default parser while splitting CJK (Chinese, Japanese, Korean) characters into 2-gram tokens. If the database's encoding is not utf-8, the parser behaves just like the default parser.

  • If you are just interested in searching HN and don't need full data dumps, the Algolia search for HN is quite good and fast:

    https://hn.algolia.com

    They also have a free API.

    If you need data dumps, maybe look into Google's BigQuery.

  • SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

    SaaSHub logo
  • parsemail

    Hanami fork of https://github.com/DusanKasan/parsemail

  • I used Postgres full-text search for mail log feature on my email forward app https://hanami.run

    Essentially allow arbitraty query in from/to/subject/body. One thing that make full-text serch work great for me is that I don't need to sort or rank the relevant of query. I just show a list of email that match the query order by their id.

    I also don't do pagination and counting, instead users has to load more paged and the ID of the email is pass to the query as a point to compare( where id < requests.get.before).

    And with those strategy, full text search works great for us since we don't really want to bring in ElasticSearch because only about 20% of users use this features.

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts

  • Show HN: Zero-downtime PostgreSQL migrations for Ruby on Rails

    10 projects | news.ycombinator.com | 20 Feb 2022
  • PostgreSQL 14 Released

    14 projects | news.ycombinator.com | 30 Sep 2021
  • How to Get the Most Out of Postgres Memory Settings

    2 projects | news.ycombinator.com | 12 Jun 2024
  • Build your own image gallery CMS

    3 projects | dev.to | 12 Jun 2024
  • Plv8: V8 Engine JavaScript Procedural Language Add-On for PostgreSQL

    1 project | news.ycombinator.com | 7 Jun 2024