rum VS litestream

Compare rum vs litestream and see what are their differences.

rum

RUM access method - inverted index with additional information in posting lists (by postgrespro)

litestream

Streaming replication for SQLite. (by benbjohnson)
InfluxDB - Power Real-Time Data Analytics at Scale
Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
www.influxdata.com
featured
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com
featured
rum litestream
11 165
693 9,997
0.7% -
4.0 7.5
4 months ago 13 days ago
C Go
GNU General Public License v3.0 or later Apache License 2.0
The number of mentions indicates the total number of mentions that we've tracked plus the number of user suggested alternatives.
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.

rum

Posts with mentions or reviews of rum. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2024-04-10.
  • Code Search Is Hard
    13 projects | news.ycombinator.com | 10 Apr 2024
    the rum index has worked well for us on roughly 1TB of pdfs. written by postgrespro, same folks who wrote core text search and json indexing. not sure why rum not in core. we have no problems.

       https://github.com/postgrespro/rum
  • Is it worth using Postgres' builtin full-text search or should I go straight to Elastic?
    2 projects | /r/PostgreSQL | 25 Apr 2023
    If you need ranking, and you have the possibility to install PostgreSQL extensions, then you can consider an extension providing RUM indexes: https://github.com/postgrespro/rum. Otherwise, you'll have to use an "external" FTS engine like ElasticSearch.
  • Features I'd Like in PostgreSQL
    14 projects | news.ycombinator.com | 28 Jan 2023
    >Reduce the memory usage of prepared queries

    Yes query plan reuse like every other db, this still blows me away PG replans every time unless you explicitly prepare and that's still per connection.

    Better full-text scoring is one for me that's missing in that list, TF/IDF or BM25 please see: https://github.com/postgrespro/rum

  • Ask HN: Books about full text search
    3 projects | news.ycombinator.com | 24 Nov 2022
    for postgres, i highly recommend the rum index over the core fts. rum is written by postgrespro, who also wrote core fts and json indexing in pg.

        https://github.com/postgrespro/rum
  • Postgres Full Text Search vs. the Rest
    21 projects | news.ycombinator.com | 14 Oct 2022
    My experience with Postgres FTS (did a comparison with Elastic a couple years back), is that filtering works fine and is speedy enough, but ranking crumbles when the resulting set is large.

    If you have a large-ish data set with lots of similar data (4M addresses and location names was the test case), Postgres FTS just doesn't perform.

    There is no index that helps scoring results. You would have to install an extension like RUM index (https://github.com/postgrespro/rum) to improve this, which may or may not be an option (often not if you use managed databases).

    If you want a best of both worlds, one could investigate this extensions (again, often not an option for managed databases): https://github.com/matthewfranglen/postgres-elasticsearch-fd...

    Either way, writing something that indexes your postgres database into elastic/opensearch is a one time investment that usually pays off in the long run.

  • Postgres Full-Text Search: A Search Engine in a Database
    3 projects | news.ycombinator.com | 11 Jul 2022
    Mandatory mention of the RUM extension (https://github.com/postgrespro/rum) if this caught your eye. Lots of tutorials and conference presentations out there showcasing the advantages in terms of ranking, timestamps...
    10 projects | news.ycombinator.com | 27 Jul 2021
    You might be just fine adding an unindexed tsvector column, since you've already filtered down the results.

    The GIN indexes for FTS don't really work in conjunction with other indices, which is why https://github.com/postgrespro/rum exists. Luckily, it sounds like you can use your existing indices to filter and let postgres scan for matches on the tsvector.

  • Postgrespro/rum: RUM access method – inverted index with additional information
    1 project | news.ycombinator.com | 17 Dec 2021
  • Debugging random slow writes in PostgreSQL
    1 project | news.ycombinator.com | 15 May 2021
    We have been bitten by the same behavior. I gave a talk with a friend about this exact topic (diagnosing GIN pending list updates) at PGCon 2019 in Ottawa[1][2].

    What you need to know is that the pending list will be merged with the main b-tree during several operations. Only one of them is so extremely critical for your insert performance - that is during actual insert. Both vacuum and autovacuum (including autovacuum analyze but not direct analyze) will merge the pending list. So frequent autovacuums are the first thing you should tune. Merging on insert happens when you exceed the gin_pending_list_limit. In all cases it is also interesting to know which memory parameter is used to rebuild the index as that inpacts how long it will take: work_mem (when triggered on insert), autovacuum_work_mem (when triggered during autovauum) and maintainance_work_mem (triggered by a call to gin_clean_pending_list()) define how much memory can be used for the rebuild.

    What you can do is:

    - tune the size of the pending list (like you did)

    - make sure vacuum runs frequently

    - if you have a bulk insert heavy workload (ie. nightly imports), drop the index and create it after inserting rows (not always makes sense business wise, depends on your app)

    - disable fastupdate, you pay a higher cost per insert but remove the fluctuctuation when the merge needs to happen

    The first thing was done in the article. However I believe the author still relies on the list being merged on insert. If vacuums were tuned agressively along with the limit (vacuums can be tuned per table). Then the list would be merged out of bound of ongoing inserts.

    I also had the pleasure of speaking with one main authors of GIN indexes (Oleg Bartunov) during the mentioned PGCon. He gave probably the best solution and informed me to "just use RUM indexes". RUM[3] indexes are like GIN indexes, without the pending list and with faster ranking, faster phrase searches and faster timestamp based ordering. It is however out of the main postgresql release so it might be hard to get it running if you don't control the extensions that are loaded to your Postgres instance.

    [1] - wideo https://www.youtube.com/watch?v=Brt41xnMZqo&t=1s

    [2] - slides https://www.pgcon.org/2019/schedule/attachments/541_Let's%20...

    [3] - https://github.com/postgrespro/rum

  • Show HN: Full text search Project Gutenberg (60m paragraphs)
    5 projects | news.ycombinator.com | 24 Jan 2021
    I suggest to have a look at https://github.com/postgrespro/rum if you haven’t yet. It solves the issue of slow ranking in PostgreSQL FTS.

litestream

Posts with mentions or reviews of litestream. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2024-04-07.
  • Ask HN: SQLite in Production?
    3 projects | news.ycombinator.com | 7 Apr 2024
    I have not, but I keep meaning to collate everything I've learned into a set of useful defaults just to remind myself what settings I should be enabling and why.

    Regarding Litestream, I learned pretty much all I know from their documentation: https://litestream.io/

  • How (and why) to run SQLite in production
    2 projects | news.ycombinator.com | 27 Mar 2024
    This presentation is focused on the use-case of vertically scaling a single server and driving everything through that app server, which is running SQLite embedded within your application process.

    This is the sweet-spot for SQLite applications, but there have been explorations and advances to running SQLite across a network of app servers. LiteFS (https://fly.io/docs/litefs/), the sibling to Litestream for backups (https://litestream.io), is aimed at precisely this use-case. Similarly, Turso (https://turso.tech) is a new-ish managed database company for running SQLite in a more traditional client-server distribution.

  • SQLite3 Replication: A Wizard's Guide🧙🏽
    2 projects | dev.to | 27 Feb 2024
    This post intends to help you setup replication for SQLite using Litestream.
  • Ask HN: Time travel" into a SQLite database using the WAL files?
    1 project | news.ycombinator.com | 2 Feb 2024
    I've been messing around with litestream. It is so cool. And, I either found a bug in the -timestamp switch or don't understand it correctly.

    What I want to do is time travel into my sqlite database. I'm trying to do some forensics on why my web service returned the wrong data during a production event. Unfortunately, after the event, someone deleted records from the database and I'm unsure what the data looked like and am having trouble recreating the production issue.

    Litestream has this great switch: -timestamp. If you use it (AFAICT) you can time travel into your database and go back to the database state at that moment. However, it does not seem to work as I expect it to:

    https://github.com/benbjohnson/litestream/issues/564

    I have the entirety of the sqlite database from the production event as well. Is there a way I could cycle through the WAL files and restore the database to the point in time before the records I need were deleted?

    Will someone take sqlite and compile it into the browser using WASM so I can drag a sqlite database and WAL files into it and then using a timeline slider see all the states of the database over time? :)

  • Ask HN: Are you using SQLite and Litestream in production?
    1 project | news.ycombinator.com | 20 Jan 2024
    We're using SQLite in production very heavily with millions of databases and fairly high operations throughput.

    But we did run into some scariness around trying to use Litestream that put me off it for the time being. Litestream is really cool but it is also very much a cool hack and the risk of database corruption issues feels very real.

    The scariness I ran into was related to this issue https://github.com/benbjohnson/litestream/issues/510

  • Pocketbase: Open-source back end in 1 file
    15 projects | news.ycombinator.com | 6 Jan 2024
    Litestream is a library that allows you to easily create backups. You can probably just do analytic queries on the backup data and reduce load on your server.

    https://litestream.io/

  • Litestream – Disaster recovery and continuous replication for SQLite
    3 projects | news.ycombinator.com | 1 Jan 2024
  • Litestream: Replicated SQLite with no main and little cost
    1 project | news.ycombinator.com | 6 Nov 2023
  • Why you should probably be using SQLite
    8 projects | news.ycombinator.com | 27 Oct 2023
    One possible strategy is to have one directory/file per customer which is one SQLite file. But then as the user logs in, you have to look up first what database they should be connected to.

    OR somehow derive it from the user ID/username. Keeping all the customer databases in a single directory/disk and then constantly "lite streaming" to S3.

    Because each user is isolated, they'll be writing to their own database. But migrations would be a pain. They will have to be rolled out to each database separately.

    One upside is, you can give users the ability to take their data with them, any time. It is just a single file.

    [0]. https://litestream.io/

  • Monitor your Websites and Apps using Uptime Kuma
    6 projects | dev.to | 11 Oct 2023
    Upstream Kuma uses a local SQLite database to store account data, configuration for services to monitor, notification settings, and more. To make sure that our data is available across redeploys, we will bundle Uptime Kuma with Litestream, a project that implements streaming replication for SQLite databases to a remote object storage provider. Effectively, this allows us to treat the local SQLite database as if it were securely stored in a remote database.

What are some alternatives?

When comparing rum and litestream you can also consider the following projects:

postgres-elasticsearch-fdw - Postgres to Elastic Search Foreign Data Wrapper

rqlite - The lightweight, distributed relational database built on SQLite.

recoll - recoll with webui in a docker container

pocketbase - Open Source realtime backend in 1 file

zombodb - Making Postgres and Elasticsearch work together like it's 2023

realtime - Broadcast, Presence, and Postgres Changes via WebSockets

pgvector - Open-source vector similarity search for Postgres

k8s-mediaserver-operator - Repository for k8s Mediaserver Operator project

pg_search - pg_search builds ActiveRecord named scopes that take advantage of PostgreSQL’s full text search

sqlcipher - SQLCipher is a standalone fork of SQLite that adds 256 bit AES encryption of database files and other security features.

pg_cjk_parser - Postgres CJK Parser pg_cjk_parser is a fts (full text search) parser derived from the default parser in PostgreSQL 11. When a postgres database uses utf-8 encoding, this parser supports all the features of the default parser while splitting CJK (Chinese, Japanese, Korean) characters into 2-gram tokens. If the database's encoding is not utf-8, the parser behaves just like the default parser.

litefs - FUSE-based file system for replicating SQLite databases across a cluster of machines