Our great sponsors
-
MeiliSearch
A lightning-fast search API that fits effortlessly into your apps, websites, and workflow
-
WorkOS
The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
-
quickwit
Cloud-native search engine for observability. An open-source alternative to Datadog, Elasticsearch, Loki, and Tempo.
-
Typesense
Open Source alternative to Algolia + Pinecone and an Easier-to-Use alternative to ElasticSearch ⚡ 🔍 ✨ Fast, typo tolerant, in-memory fuzzy Search Engine for building delightful search experiences
-
sonic
🦔 Fast, lightweight & schema-less search backend. An alternative to Elasticsearch that runs on a few MBs of RAM.
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
-
pgx
Discontinued Build Postgres Extensions with Rust! [Moved to: https://github.com/tcdi/pgrx] (by tcdi)
-
PostgreSQL
Mirror of the official PostgreSQL GIT repository. Note that this is just a *mirror* - we don't work with pull requests on github. To contribute, please see https://wiki.postgresql.org/wiki/Submitting_a_Patch
-
rum
RUM access method - inverted index with additional information in posting lists (by postgrespro)
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
Hi Quentin, thank you for your reply. I was just referring to this comment of yours: https://github.com/meilisearch/MeiliSearch/discussions/870#d.... It is good to hear that Meilisearch cloud is encrypted though. Will definitely keep an eye on Meilisearch for future projects.
Yup! I'd heard of Quickwit and sonic, but Quickwit seems to have pivoted to being a log-search focused engine. It's built on Tantivy[0] IIRC so I could have used something like Toshi[1].
Sonic[2] I know much less about but it also seems good. Honestly anything except ES is what I like to hear about (though OpenSearch is interesting).
Another thing I think the world really needs is a CLI +/- API tool (ideally rust lib + CLI + API) that unifies interacting with these things. I got REALLY close to writing it while working on this article, but I was already running late and I have a penchant for yak shaving.
This won't be the last thing I write about search engines -- there's been a LOT of movement in the space that has nothing to do with the elastic/opensearch debacle and I don't see enough tires getting kicked.
[0]: https://github.com/quickwit-oss/tantivy
[1]: https://github.com/toshi-search/Toshi
[2]: https://github.com/valeriansaliou/sonic
Yup! I'd heard of Quickwit and sonic, but Quickwit seems to have pivoted to being a log-search focused engine. It's built on Tantivy[0] IIRC so I could have used something like Toshi[1].
Sonic[2] I know much less about but it also seems good. Honestly anything except ES is what I like to hear about (though OpenSearch is interesting).
Another thing I think the world really needs is a CLI +/- API tool (ideally rust lib + CLI + API) that unifies interacting with these things. I got REALLY close to writing it while working on this article, but I was already running late and I have a penchant for yak shaving.
This won't be the last thing I write about search engines -- there's been a LOT of movement in the space that has nothing to do with the elastic/opensearch debacle and I don't see enough tires getting kicked.
[0]: https://github.com/quickwit-oss/tantivy
[1]: https://github.com/toshi-search/Toshi
[2]: https://github.com/valeriansaliou/sonic
I wish we had an extension like ZomboDB but using a lighter search engine like https://github.com/quickwit-oss/quickwit, https://github.com/toshi-search/Toshi and https://github.com/mosuka/bayard
Here I'm listing engines based on https://github.com/quickwit-oss/tantivy - tantivy is comparable to Lucene in its scope - but I'm sure there are other engines that could tackle ElasticSearch.
Another thing that could happen is maybe directly embed tantivy in Postgres using an extension, perhaps this could be an option too.
This hasn't made the front page of HN (maybe someone should post it), but Litestream actually doesn't do replication that way any more -- I stumbled upon this commit like 2 days ago:
https://github.com/benbjohnson/litestream/pull/411
Yup! I'd heard of Quickwit and sonic, but Quickwit seems to have pivoted to being a log-search focused engine. It's built on Tantivy[0] IIRC so I could have used something like Toshi[1].
Sonic[2] I know much less about but it also seems good. Honestly anything except ES is what I like to hear about (though OpenSearch is interesting).
Another thing I think the world really needs is a CLI +/- API tool (ideally rust lib + CLI + API) that unifies interacting with these things. I got REALLY close to writing it while working on this article, but I was already running late and I have a penchant for yak shaving.
This won't be the last thing I write about search engines -- there's been a LOT of movement in the space that has nothing to do with the elastic/opensearch debacle and I don't see enough tires getting kicked.
[0]: https://github.com/quickwit-oss/tantivy
[1]: https://github.com/toshi-search/Toshi
[2]: https://github.com/valeriansaliou/sonic
> That thread led me to a project/product idea where you take an existing Postgres instance used for normal products or whatever, replicate it to various read only clusters with a custom search extension loaded and some orchestrator sitting on top (I’ve written most of one in rust that uses 0mq to communicate with it’s nodes) and create drop in search from existing databases with a nice guided web gui for automatic tuning suitable for most business use cases.
Very interesting idea -- just want to add one thing, write it in rust (with pgx?[0]) :)
[0]: https://github.com/tcdi/pgx
There are EXTREME limitations in the source code around it that no one really talks about but have important implications.
Phrase searching doesn't really work the way our analysts would have liked and needed. There are a bunch of technical limitations in the source about how much data is tracked about the tokens. I can't remember exactly but there was something weird about stemming or lemming in the phrase search too.
The following variables need to be bumped up to get phrase searching more accurate.
- MAXSTRLEN (2047) https://github.com/postgres/postgres/blob/master/src/include...
- MAXSTRPOS (1048575) https://github.com/postgres/postgres/blob/master/src/include...
- MAXENTRYPOS (16363) https://github.com/postgres/postgres/blob/master/src/include...
- MAXNUMPOS (256) https://github.com/postgres/postgres/blob/master/src/include...
TsHeadline for highlighting doesn't consider phrase searching so you can weird results. It probably needs to be rewritten to match websearch_to_tsquery.
The accuracy issue drained the blood from my BA's faces. I eventually just went with on-premise SOLR because it's easier to add new hardware for it than elasticsearch.
TLDR: postgres search is probably fine for short-form content, but major gotcha's once you go past those max limits. Also phrase searching will probably not work the way people are used too.
Good to know. If you find some Russian language support improvement we can do, don't hesitate to create an issue on our tokeniser https://github.com/meilisearch/charabia.
Moreover, it's Hacktoberfest. If you want to help us improve the language support, it would be awesome!
I wish we had an extension like ZomboDB but using a lighter search engine like https://github.com/quickwit-oss/quickwit, https://github.com/toshi-search/Toshi and https://github.com/mosuka/bayard
Here I'm listing engines based on https://github.com/quickwit-oss/tantivy - tantivy is comparable to Lucene in its scope - but I'm sure there are other engines that could tackle ElasticSearch.
Another thing that could happen is maybe directly embed tantivy in Postgres using an extension, perhaps this could be an option too.
My experience with Postgres FTS (did a comparison with Elastic a couple years back), is that filtering works fine and is speedy enough, but ranking crumbles when the resulting set is large.
If you have a large-ish data set with lots of similar data (4M addresses and location names was the test case), Postgres FTS just doesn't perform.
There is no index that helps scoring results. You would have to install an extension like RUM index (https://github.com/postgrespro/rum) to improve this, which may or may not be an option (often not if you use managed databases).
If you want a best of both worlds, one could investigate this extensions (again, often not an option for managed databases): https://github.com/matthewfranglen/postgres-elasticsearch-fd...
Either way, writing something that indexes your postgres database into elastic/opensearch is a one time investment that usually pays off in the long run.
My experience with Postgres FTS (did a comparison with Elastic a couple years back), is that filtering works fine and is speedy enough, but ranking crumbles when the resulting set is large.
If you have a large-ish data set with lots of similar data (4M addresses and location names was the test case), Postgres FTS just doesn't perform.
There is no index that helps scoring results. You would have to install an extension like RUM index (https://github.com/postgrespro/rum) to improve this, which may or may not be an option (often not if you use managed databases).
If you want a best of both worlds, one could investigate this extensions (again, often not an option for managed databases): https://github.com/matthewfranglen/postgres-elasticsearch-fd...
Either way, writing something that indexes your postgres database into elastic/opensearch is a one time investment that usually pays off in the long run.
My experience with Postgres FTS (did a comparison with Elastic a couple years back), is that filtering works fine and is speedy enough, but ranking crumbles when the resulting set is large.
If you have a large-ish data set with lots of similar data (4M addresses and location names was the test case), Postgres FTS just doesn't perform.
There is no index that helps scoring results. You would have to install an extension like RUM index (https://github.com/postgrespro/rum) to improve this, which may or may not be an option (often not if you use managed databases).
If you want a best of both worlds, one could investigate this extensions (again, often not an option for managed databases): https://github.com/matthewfranglen/postgres-elasticsearch-fd...
Either way, writing something that indexes your postgres database into elastic/opensearch is a one time investment that usually pays off in the long run.
You're right, that's actually what we implemented, application-level hooks, but they needed development and maintenance effort that come for free with the adapter we're using for OpenSearch integration, which also comes with welcome features: synonyms, partial matches, and many others.
Spoiler, the adapter is Searchkick: https://github.com/ankane/searchkick
You could have a bit of logic on top of PG to better handle the search query, for example using https://github.com/caub/pg-tsquery
Related posts
- ZincSearch – lightweight alternative to Elasticsearch written in Go
- Meilisearch raises a $5M Seed to change the world of user-facing search
- Code Search Is Hard
-
SeekStorm VS tantivy - a user suggested alternative
2 projects | 22 Mar 2024
- YaCy, a distributed Web Search Engine, based on a peer-to-peer network