Postgres Full Text Search vs. the Rest

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com

Our great sponsors
  • WorkOS - The modern identity platform for B2B SaaS
  • InfluxDB - Power Real-Time Data Analytics at Scale
  • SaaSHub - Software Alternatives and Reviews
  • MeiliSearch

    A lightning-fast search API that fits effortlessly into your apps, websites, and workflow

  • Hi Quentin, thank you for your reply. I was just referring to this comment of yours: https://github.com/meilisearch/MeiliSearch/discussions/870#d.... It is good to hear that Meilisearch cloud is encrypted though. Will definitely keep an eye on Meilisearch for future projects.

  • tantivy

    Tantivy is a full-text search engine library inspired by Apache Lucene and written in Rust

  • Yup! I'd heard of Quickwit and sonic, but Quickwit seems to have pivoted to being a log-search focused engine. It's built on Tantivy[0] IIRC so I could have used something like Toshi[1].

    Sonic[2] I know much less about but it also seems good. Honestly anything except ES is what I like to hear about (though OpenSearch is interesting).

    Another thing I think the world really needs is a CLI +/- API tool (ideally rust lib + CLI + API) that unifies interacting with these things. I got REALLY close to writing it while working on this article, but I was already running late and I have a penchant for yak shaving.

    This won't be the last thing I write about search engines -- there's been a LOT of movement in the space that has nothing to do with the elastic/opensearch debacle and I don't see enough tires getting kicked.

    [0]: https://github.com/quickwit-oss/tantivy

    [1]: https://github.com/toshi-search/Toshi

    [2]: https://github.com/valeriansaliou/sonic

  • WorkOS

    The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.

    WorkOS logo
  • Toshi

    A full-text search engine in rust

  • Yup! I'd heard of Quickwit and sonic, but Quickwit seems to have pivoted to being a log-search focused engine. It's built on Tantivy[0] IIRC so I could have used something like Toshi[1].

    Sonic[2] I know much less about but it also seems good. Honestly anything except ES is what I like to hear about (though OpenSearch is interesting).

    Another thing I think the world really needs is a CLI +/- API tool (ideally rust lib + CLI + API) that unifies interacting with these things. I got REALLY close to writing it while working on this article, but I was already running late and I have a penchant for yak shaving.

    This won't be the last thing I write about search engines -- there's been a LOT of movement in the space that has nothing to do with the elastic/opensearch debacle and I don't see enough tires getting kicked.

    [0]: https://github.com/quickwit-oss/tantivy

    [1]: https://github.com/toshi-search/Toshi

    [2]: https://github.com/valeriansaliou/sonic

  • quickwit

    Cloud-native search engine for observability. An open-source alternative to Datadog, Elasticsearch, Loki, and Tempo.

  • I wish we had an extension like ZomboDB but using a lighter search engine like https://github.com/quickwit-oss/quickwit, https://github.com/toshi-search/Toshi and https://github.com/mosuka/bayard

    Here I'm listing engines based on https://github.com/quickwit-oss/tantivy - tantivy is comparable to Lucene in its scope - but I'm sure there are other engines that could tackle ElasticSearch.

    Another thing that could happen is maybe directly embed tantivy in Postgres using an extension, perhaps this could be an option too.

  • Typesense

    Open Source alternative to Algolia + Pinecone and an Easier-to-Use alternative to ElasticSearch ⚡ 🔍 ✨ Fast, typo tolerant, in-memory fuzzy Search Engine for building delightful search experiences

  • litestream

    Streaming replication for SQLite.

  • This hasn't made the front page of HN (maybe someone should post it), but Litestream actually doesn't do replication that way any more -- I stumbled upon this commit like 2 days ago:

    https://github.com/benbjohnson/litestream/pull/411

  • sonic

    🦔 Fast, lightweight & schema-less search backend. An alternative to Elasticsearch that runs on a few MBs of RAM.

  • Yup! I'd heard of Quickwit and sonic, but Quickwit seems to have pivoted to being a log-search focused engine. It's built on Tantivy[0] IIRC so I could have used something like Toshi[1].

    Sonic[2] I know much less about but it also seems good. Honestly anything except ES is what I like to hear about (though OpenSearch is interesting).

    Another thing I think the world really needs is a CLI +/- API tool (ideally rust lib + CLI + API) that unifies interacting with these things. I got REALLY close to writing it while working on this article, but I was already running late and I have a penchant for yak shaving.

    This won't be the last thing I write about search engines -- there's been a LOT of movement in the space that has nothing to do with the elastic/opensearch debacle and I don't see enough tires getting kicked.

    [0]: https://github.com/quickwit-oss/tantivy

    [1]: https://github.com/toshi-search/Toshi

    [2]: https://github.com/valeriansaliou/sonic

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • pgx

    Discontinued Build Postgres Extensions with Rust! [Moved to: https://github.com/tcdi/pgrx] (by tcdi)

  • > That thread led me to a project/product idea where you take an existing Postgres instance used for normal products or whatever, replicate it to various read only clusters with a custom search extension loaded and some orchestrator sitting on top (I’ve written most of one in rust that uses 0mq to communicate with it’s nodes) and create drop in search from existing databases with a nice guided web gui for automatic tuning suitable for most business use cases.

    Very interesting idea -- just want to add one thing, write it in rust (with pgx?[0]) :)

    [0]: https://github.com/tcdi/pgx

  • PostgreSQL

    Mirror of the official PostgreSQL GIT repository. Note that this is just a *mirror* - we don't work with pull requests on github. To contribute, please see https://wiki.postgresql.org/wiki/Submitting_a_Patch

  • There are EXTREME limitations in the source code around it that no one really talks about but have important implications.

    Phrase searching doesn't really work the way our analysts would have liked and needed. There are a bunch of technical limitations in the source about how much data is tracked about the tokens. I can't remember exactly but there was something weird about stemming or lemming in the phrase search too.

    The following variables need to be bumped up to get phrase searching more accurate.

    - MAXSTRLEN (2047) https://github.com/postgres/postgres/blob/master/src/include...

    - MAXSTRPOS (1048575) https://github.com/postgres/postgres/blob/master/src/include...

    - MAXENTRYPOS (16363) https://github.com/postgres/postgres/blob/master/src/include...

    - MAXNUMPOS (256) https://github.com/postgres/postgres/blob/master/src/include...

    TsHeadline for highlighting doesn't consider phrase searching so you can weird results. It probably needs to be rewritten to match websearch_to_tsquery.

    The accuracy issue drained the blood from my BA's faces. I eventually just went with on-premise SOLR because it's easier to add new hardware for it than elasticsearch.

    TLDR: postgres search is probably fine for short-form content, but major gotcha's once you go past those max limits. Also phrase searching will probably not work the way people are used too.

  • zombodb

    Making Postgres and Elasticsearch work together like it's 2023

  • charabia

    Library used by Meilisearch to tokenize queries and documents

  • Good to know. If you find some Russian language support improvement we can do, don't hesitate to create an issue on our tokeniser https://github.com/meilisearch/charabia.

    Moreover, it's Hacktoberfest. If you want to help us improve the language support, it would be awesome!

  • bayard

    A full-text search and indexing server written in Rust.

  • I wish we had an extension like ZomboDB but using a lighter search engine like https://github.com/quickwit-oss/quickwit, https://github.com/toshi-search/Toshi and https://github.com/mosuka/bayard

    Here I'm listing engines based on https://github.com/quickwit-oss/tantivy - tantivy is comparable to Lucene in its scope - but I'm sure there are other engines that could tackle ElasticSearch.

    Another thing that could happen is maybe directly embed tantivy in Postgres using an extension, perhaps this could be an option too.

  • rum

    RUM access method - inverted index with additional information in posting lists (by postgrespro)

  • My experience with Postgres FTS (did a comparison with Elastic a couple years back), is that filtering works fine and is speedy enough, but ranking crumbles when the resulting set is large.

    If you have a large-ish data set with lots of similar data (4M addresses and location names was the test case), Postgres FTS just doesn't perform.

    There is no index that helps scoring results. You would have to install an extension like RUM index (https://github.com/postgrespro/rum) to improve this, which may or may not be an option (often not if you use managed databases).

    If you want a best of both worlds, one could investigate this extensions (again, often not an option for managed databases): https://github.com/matthewfranglen/postgres-elasticsearch-fd...

    Either way, writing something that indexes your postgres database into elastic/opensearch is a one time investment that usually pays off in the long run.

  • postgres-elasticsearch-fdw

    Postgres to Elastic Search Foreign Data Wrapper

  • My experience with Postgres FTS (did a comparison with Elastic a couple years back), is that filtering works fine and is speedy enough, but ranking crumbles when the resulting set is large.

    If you have a large-ish data set with lots of similar data (4M addresses and location names was the test case), Postgres FTS just doesn't perform.

    There is no index that helps scoring results. You would have to install an extension like RUM index (https://github.com/postgrespro/rum) to improve this, which may or may not be an option (often not if you use managed databases).

    If you want a best of both worlds, one could investigate this extensions (again, often not an option for managed databases): https://github.com/matthewfranglen/postgres-elasticsearch-fd...

    Either way, writing something that indexes your postgres database into elastic/opensearch is a one time investment that usually pays off in the long run.

  • My experience with Postgres FTS (did a comparison with Elastic a couple years back), is that filtering works fine and is speedy enough, but ranking crumbles when the resulting set is large.

    If you have a large-ish data set with lots of similar data (4M addresses and location names was the test case), Postgres FTS just doesn't perform.

    There is no index that helps scoring results. You would have to install an extension like RUM index (https://github.com/postgrespro/rum) to improve this, which may or may not be an option (often not if you use managed databases).

    If you want a best of both worlds, one could investigate this extensions (again, often not an option for managed databases): https://github.com/matthewfranglen/postgres-elasticsearch-fd...

    Either way, writing something that indexes your postgres database into elastic/opensearch is a one time investment that usually pays off in the long run.

  • Searchkick

    Intelligent search made easy

  • You're right, that's actually what we implemented, application-level hooks, but they needed development and maintenance effort that come for free with the adapter we're using for OpenSearch integration, which also comes with welcome features: synonyms, partial matches, and many others.

    Spoiler, the adapter is Searchkick: https://github.com/ankane/searchkick

  • pg-tsquery

    :mag: Parse user input into a valid PostgreSQL tsquery

  • You could have a bit of logic on top of PG to better handle the search query, for example using https://github.com/caub/pg-tsquery

  • SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

    SaaSHub logo
NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts