rum
pgx
rum | pgx | |
---|---|---|
11 | 19 | |
693 | 2,376 | |
0.7% | - | |
4.0 | 9.6 | |
4 months ago | about 1 year ago | |
C | Rust | |
GNU General Public License v3.0 or later | MIT License |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
rum
-
Code Search Is Hard
the rum index has worked well for us on roughly 1TB of pdfs. written by postgrespro, same folks who wrote core text search and json indexing. not sure why rum not in core. we have no problems.
https://github.com/postgrespro/rum
-
Is it worth using Postgres' builtin full-text search or should I go straight to Elastic?
If you need ranking, and you have the possibility to install PostgreSQL extensions, then you can consider an extension providing RUM indexes: https://github.com/postgrespro/rum. Otherwise, you'll have to use an "external" FTS engine like ElasticSearch.
-
Features I'd Like in PostgreSQL
>Reduce the memory usage of prepared queries
Yes query plan reuse like every other db, this still blows me away PG replans every time unless you explicitly prepare and that's still per connection.
Better full-text scoring is one for me that's missing in that list, TF/IDF or BM25 please see: https://github.com/postgrespro/rum
-
Ask HN: Books about full text search
for postgres, i highly recommend the rum index over the core fts. rum is written by postgrespro, who also wrote core fts and json indexing in pg.
https://github.com/postgrespro/rum
-
Postgres Full Text Search vs. the Rest
My experience with Postgres FTS (did a comparison with Elastic a couple years back), is that filtering works fine and is speedy enough, but ranking crumbles when the resulting set is large.
If you have a large-ish data set with lots of similar data (4M addresses and location names was the test case), Postgres FTS just doesn't perform.
There is no index that helps scoring results. You would have to install an extension like RUM index (https://github.com/postgrespro/rum) to improve this, which may or may not be an option (often not if you use managed databases).
If you want a best of both worlds, one could investigate this extensions (again, often not an option for managed databases): https://github.com/matthewfranglen/postgres-elasticsearch-fd...
Either way, writing something that indexes your postgres database into elastic/opensearch is a one time investment that usually pays off in the long run.
-
Postgres Full-Text Search: A Search Engine in a Database
Mandatory mention of the RUM extension (https://github.com/postgrespro/rum) if this caught your eye. Lots of tutorials and conference presentations out there showcasing the advantages in terms of ranking, timestamps...
You might be just fine adding an unindexed tsvector column, since you've already filtered down the results.
The GIN indexes for FTS don't really work in conjunction with other indices, which is why https://github.com/postgrespro/rum exists. Luckily, it sounds like you can use your existing indices to filter and let postgres scan for matches on the tsvector.
- Postgrespro/rum: RUM access method – inverted index with additional information
-
Debugging random slow writes in PostgreSQL
We have been bitten by the same behavior. I gave a talk with a friend about this exact topic (diagnosing GIN pending list updates) at PGCon 2019 in Ottawa[1][2].
What you need to know is that the pending list will be merged with the main b-tree during several operations. Only one of them is so extremely critical for your insert performance - that is during actual insert. Both vacuum and autovacuum (including autovacuum analyze but not direct analyze) will merge the pending list. So frequent autovacuums are the first thing you should tune. Merging on insert happens when you exceed the gin_pending_list_limit. In all cases it is also interesting to know which memory parameter is used to rebuild the index as that inpacts how long it will take: work_mem (when triggered on insert), autovacuum_work_mem (when triggered during autovauum) and maintainance_work_mem (triggered by a call to gin_clean_pending_list()) define how much memory can be used for the rebuild.
What you can do is:
- tune the size of the pending list (like you did)
- make sure vacuum runs frequently
- if you have a bulk insert heavy workload (ie. nightly imports), drop the index and create it after inserting rows (not always makes sense business wise, depends on your app)
- disable fastupdate, you pay a higher cost per insert but remove the fluctuctuation when the merge needs to happen
The first thing was done in the article. However I believe the author still relies on the list being merged on insert. If vacuums were tuned agressively along with the limit (vacuums can be tuned per table). Then the list would be merged out of bound of ongoing inserts.
I also had the pleasure of speaking with one main authors of GIN indexes (Oleg Bartunov) during the mentioned PGCon. He gave probably the best solution and informed me to "just use RUM indexes". RUM[3] indexes are like GIN indexes, without the pending list and with faster ranking, faster phrase searches and faster timestamp based ordering. It is however out of the main postgresql release so it might be hard to get it running if you don't control the extensions that are loaded to your Postgres instance.
[1] - wideo https://www.youtube.com/watch?v=Brt41xnMZqo&t=1s
[2] - slides https://www.pgcon.org/2019/schedule/attachments/541_Let's%20...
[3] - https://github.com/postgrespro/rum
-
Show HN: Full text search Project Gutenberg (60m paragraphs)
I suggest to have a look at https://github.com/postgrespro/rum if you haven’t yet. It solves the issue of slow ranking in PostgreSQL FTS.
pgx
-
Write Postgres functions in Rust
It uses pgx (https://github.com/tcdi/pgx) which is our more generalized framework for developing Postgres extensions with Rust.
-
Why not Rust for Omnigres?
It's a great question, considering I've been using Rust for a number of years now, and I generally advocate its use for its rich ecosystem, safety and tooling. I actively contribute to pgx, a library for building Postgres extensions in Rust. Yet, Omnigres appears to be all done in C.
-
Supabase Wrappers: A Framework for Building Postgres Foreign Data Wrappers
Our release today is a framework which extends this functionality to other databases/systems. If you’re familiar with Multicorn[1] or Steampipe[2], then it’s very similar. The framework is written in Rust, using the excellent pgx[3].
We have developed FDWs for Stripe, Firebase, BigQuery, Clickhouse, and Airtable (all in various “pre-release” states). The plan is to focus on the tools we’re using internally while we stabalize the framework.
There’s a lot in the blog post into our goals for this release. It’s early, but one of the things I’m most excited about.
[0] Postgres FDW: [https://www.postgresql.org/docs/current/sql-createforeigndat...
[1] Multicorn: https://multicorn.org/
[2] Steampipe: https://steampipe.io/
[2] pgx: [https://github.com/tcdi/pgx](https://github.com/tcdi/pgx)
- Apache Age, a PostgreSQL Extension with Graph Database Functionality
-
Postgres FTS vs the new wave of search engines
BTW one nice easter egg is that with pgx there is actually no reason that we can't build even better search solutions inside the database itself.
-
Postgres Full Text Search vs. the Rest
> That thread led me to a project/product idea where you take an existing Postgres instance used for normal products or whatever, replicate it to various read only clusters with a custom search extension loaded and some orchestrator sitting on top (I’ve written most of one in rust that uses 0mq to communicate with it’s nodes) and create drop in search from existing databases with a nice guided web gui for automatic tuning suitable for most business use cases.
Very interesting idea -- just want to add one thing, write it in rust (with pgx?[0]) :)
[0]: https://github.com/tcdi/pgx
-
Show HN: pg_idkit, a Postgres extension for generating exotic UUIDs
Hey HN,
It turns out choosing a good database-optimized UUID (and deciding whether to use serial, etc) isn't quite so simple, and I finally got a chance to do some exploration, write about it[0].
One of the reasons Postgres is the best open source database out there is it's extensibility -- so I hacked up a small extension for generating some of the more exotic (but crucially, lexicograhically sortable) UUID generation mechanisms:
https://github.com/t3hmrman/pg_idkit
This idea has been bumbling around my head for a while, but I finally got a chance to build it while working with Supabase on a post about IDs[0]!
Most of the heavy lifting is done by pgx[1] which is an amazing framework for building Postgres extensions in Rust. I think we are very early to the trend of amazing postgres extensions built in Rust, and it's yet another reason that it's an exciting time to be all-in on Postgres.
[0]: https://supabase.com/blog/choosing-a-postgres-primary-key
[1]: https://github.com/tcdi/pgx
[0]: https://supabase.com/blog/choosing-a-postgres-primary-key
-
Introducing pg_idkit: an extension for generating lexicographically sortable UUIDs (UUIDv6-8, CUID, Timeflake) in Postgres
The extension is still WIP but for those of ya'll that like Rust it's built on pgx which has excellent DX. The rust involved isn't complicated -- I'm basically laundering the functionality from other crates that are listed in the README.md.
-
GitHub - supabase/pg_jsonschema: PostgreSQL extension providing JSON Schema validation
Seems to be using this: https://github.com/tcdi/pgx
-
Show HN: Pg_jsonschema – A Postgres extension for JSON validation
- https://github.com/furstenheim/is_jsonb_valid
pgx[0] is going to be pretty revolutionary for the postgres ecosystem I think -- there is so much functionality that can be utilized at the database level and I can't think of a language I want to do it with more than Rust.
[0]: https://github.com/tcdi/pgx
What are some alternatives?
postgres-elasticsearch-fdw - Postgres to Elastic Search Foreign Data Wrapper
tauri - Build smaller, faster, and more secure desktop applications with a web frontend.
recoll - recoll with webui in a docker container
code - Source code for the book Rust in Action
zombodb - Making Postgres and Elasticsearch work together like it's 2023
bevy - A refreshingly simple data-driven game engine built in Rust
pgvector - Open-source vector similarity search for Postgres
postgrest - REST API for any Postgres database
pg_search - pg_search builds ActiveRecord named scopes that take advantage of PostgreSQL’s full text search
supabase-graphql-example - A HackerNews-like clone built with Supabase and pg_graphql
pg_cjk_parser - Postgres CJK Parser pg_cjk_parser is a fts (full text search) parser derived from the default parser in PostgreSQL 11. When a postgres database uses utf-8 encoding, this parser supports all the features of the default parser while splitting CJK (Chinese, Japanese, Korean) characters into 2-gram tokens. If the database's encoding is not utf-8, the parser behaves just like the default parser.
feophant - A PostgreSQL inspired SQL database written in Rust.