bleve
litestream
bleve | litestream | |
---|---|---|
13 | 165 | |
9,674 | 10,026 | |
0.7% | - | |
8.0 | 7.5 | |
about 17 hours ago | 14 days ago | |
Go | Go | |
Apache License 2.0 | Apache License 2.0 |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
bleve
-
Hermes v1.7
I don't have the answer to that, but the project has been alive for many years. Seems maybe you should find the answer since you are developing a competing solution? Also it might be a good reference project for solving similar problems to yours. They do have bench tests you could play with https://github.com/blevesearch/bleve/blob/master/query_bench_test.go
-
Seeking a free full text search solution for large data with progress display
I know of https://github.com/blevesearch/bleve and I think there was another project for full text search that I can't find now.
- Any Full Text Search library for json data?
-
An alternative to Elasticsearch that runs on a few MBs of RAM
I would be interested in such a testbed. I would also like to know how Bleve Search (https://github.com/blevesearch/bleve) turns out.
I have for many years now a small search engine project in my free-time pipeline, but I'm before crawling even and I intend to sit for searching part after some of that.
- What is the coolest Go open source projects you have seen?
-
BetterCache 2.0 (has full text search/remove, etc.)
Haha. Seriously I can’t tell the difference between these libraries https://github.com/blevesearch/bleve
-
I want to dive into how to make search engines
I've never worked on a project that encompasses as many computer science algorithms as a search engine. There are a lot of topics you can lookup in "Information Storage and Retrieval":
- Tries (patricia, radix, etc...)
- Trees (b-trees, b+trees, merkle trees, log-structured merge-tree, etc..)
- Consensus (raft, paxos, etc..)
- Block storage (disk block size optimizations, mmap files, delta storage, etc..)
- Probabilistic filters (hyperloloog, bloom filters, etc...)
- Binary Search (sstables, sorted inverted indexes, roaring bitmaps)
- Ranking (pagerank, tf/idf, bm25, etc...)
- NLP (stemming, POS tagging, subject identification, sentiment analysis etc...)
- HTML (document parsing/lexing)
- Images (exif extraction, removal, resizing / proxying, etc...)
- Queues (SQS, NATS, Apollo, etc...)
- Clustering (k-means, density, hierarchical, gaussian distributions, etc...)
- Rate limiting (leaky bucket, windowed, etc...)
- Compression
- Applied linear algebra
- Text processing (unicode-normalization, slugify, sanitation, lossless and lossy hashing like metaphone and document fingerprinting)
- etc...
I'm sure there is plenty more I've missed. There are lots of generic structures involved like hashes, linked-lists, skip-lists, heaps and priority queues and this is just to get 2000's level basic tech.
- https://github.com/quickwit-oss/tantivy
- https://github.com/valeriansaliou/sonic
- https://github.com/mosuka/phalanx
- https://github.com/meilisearch/MeiliSearch
- https://github.com/blevesearch/bleve
- https://github.com/thomasjungblut/go-sstables
A lot of people new to this space mistakenly think you can just throw elastic search or postgres fulltext search in front of terabytes of records and have something decent. The problem is that search with good rankings often requires custom storage so calculations can be sharded among multiple nodes and you can do layered ranking without passing huge blobs of results between systems.
-
Why Writing Your Own Search Engine Is Hard (2004)
For those curious, I'm on my 3rd search engine as I keep discovering new methods of compactly and efficiently processing and querying results.
There isn't a one-size-fits all approach, but I've never worked on a project that encompasses as many computer science algorithms as a search engine.
- Tries (patricia, radix, etc...)
- Trees (b-trees, b+trees, merkle trees, log-structured merge-tree, etc..)
- Consensus (raft, paxos, etc..)
- Block storage (disk block size optimizations, mmap files, delta storage, etc..)
- Probabilistic filters (hyperloloog, bloom filters, etc...)
- Binary Search (sstables, sorted inverted indexes)
- Ranking (pagerank, tf/idf, bm25, etc...)
- NLP (stemming, POS tagging, subject identification, etc...)
- HTML (document parsing/lexing)
- Images (exif extraction, removal, resizing / proxying, etc...)
- Queues (SQS, NATS, Apollo, etc...)
- Clustering (k-means, density, hierarchical, gaussian distributions, etc...)
- Rate limiting (leaky bucket, windowed, etc...)
- text processing (unicode-normalization, slugify, sanitation, lossless and lossy hashing like metaphone and document fingerprinting)
- etc...
I'm sure there is plenty more I've missed. There are lots of generic structures involved like hashes, linked-lists, skip-lists, heaps and priority queues and this is just to get 2000's level basic tech.
- https://github.com/quickwit-oss/tantivy
- https://github.com/valeriansaliou/sonic
- https://github.com/mosuka/phalanx
- https://github.com/meilisearch/MeiliSearch
- https://github.com/blevesearch/bleve
A lot of people new to this space mistakenly think you can just throw elastic search or postgres fulltext search in front of terabytes of records and have something decent. That might work for something small like a curated collection of a few hundred sites.
-
Mattermost – open-source platform for secure collaboration
Search in SQL databases is a tough beast to get it right. And given that we support MySQL and Postgres both, it gets even harder to support quirks of both of them.
In enterprise editions, the only addition is Elasticsearch. But in our open-source version, we do have support for https://github.com/blevesearch/bleve. Although, it's in beta, we have a lot of customers using it.
I am wondering if you have tried using it and didn't like it?
- A Database for 2022
litestream
-
Ask HN: SQLite in Production?
I have not, but I keep meaning to collate everything I've learned into a set of useful defaults just to remind myself what settings I should be enabling and why.
Regarding Litestream, I learned pretty much all I know from their documentation: https://litestream.io/
-
How (and why) to run SQLite in production
This presentation is focused on the use-case of vertically scaling a single server and driving everything through that app server, which is running SQLite embedded within your application process.
This is the sweet-spot for SQLite applications, but there have been explorations and advances to running SQLite across a network of app servers. LiteFS (https://fly.io/docs/litefs/), the sibling to Litestream for backups (https://litestream.io), is aimed at precisely this use-case. Similarly, Turso (https://turso.tech) is a new-ish managed database company for running SQLite in a more traditional client-server distribution.
-
SQLite3 Replication: A Wizard's Guide🧙🏽
This post intends to help you setup replication for SQLite using Litestream.
-
Ask HN: Time travel" into a SQLite database using the WAL files?
I've been messing around with litestream. It is so cool. And, I either found a bug in the -timestamp switch or don't understand it correctly.
What I want to do is time travel into my sqlite database. I'm trying to do some forensics on why my web service returned the wrong data during a production event. Unfortunately, after the event, someone deleted records from the database and I'm unsure what the data looked like and am having trouble recreating the production issue.
Litestream has this great switch: -timestamp. If you use it (AFAICT) you can time travel into your database and go back to the database state at that moment. However, it does not seem to work as I expect it to:
https://github.com/benbjohnson/litestream/issues/564
I have the entirety of the sqlite database from the production event as well. Is there a way I could cycle through the WAL files and restore the database to the point in time before the records I need were deleted?
Will someone take sqlite and compile it into the browser using WASM so I can drag a sqlite database and WAL files into it and then using a timeline slider see all the states of the database over time? :)
-
Ask HN: Are you using SQLite and Litestream in production?
We're using SQLite in production very heavily with millions of databases and fairly high operations throughput.
But we did run into some scariness around trying to use Litestream that put me off it for the time being. Litestream is really cool but it is also very much a cool hack and the risk of database corruption issues feels very real.
The scariness I ran into was related to this issue https://github.com/benbjohnson/litestream/issues/510
-
Pocketbase: Open-source back end in 1 file
Litestream is a library that allows you to easily create backups. You can probably just do analytic queries on the backup data and reduce load on your server.
https://litestream.io/
- Litestream – Disaster recovery and continuous replication for SQLite
- Litestream: Replicated SQLite with no main and little cost
-
Why you should probably be using SQLite
One possible strategy is to have one directory/file per customer which is one SQLite file. But then as the user logs in, you have to look up first what database they should be connected to.
OR somehow derive it from the user ID/username. Keeping all the customer databases in a single directory/disk and then constantly "lite streaming" to S3.
Because each user is isolated, they'll be writing to their own database. But migrations would be a pain. They will have to be rolled out to each database separately.
One upside is, you can give users the ability to take their data with them, any time. It is just a single file.
[0]. https://litestream.io/
-
Monitor your Websites and Apps using Uptime Kuma
Upstream Kuma uses a local SQLite database to store account data, configuration for services to monitor, notification settings, and more. To make sure that our data is available across redeploys, we will bundle Uptime Kuma with Litestream, a project that implements streaming replication for SQLite databases to a remote object storage provider. Effectively, this allows us to treat the local SQLite database as if it were securely stored in a remote database.
What are some alternatives?
Elasticsearch - Free and Open, Distributed, RESTful Search Engine
rqlite - The lightweight, distributed relational database built on SQLite.
elastic - Deprecated: Use the official Elasticsearch client for Go at https://github.com/elastic/go-elasticsearch
pocketbase - Open Source realtime backend in 1 file
goriak - goriak - Go language driver for Riak KV
realtime - Broadcast, Presence, and Postgres Changes via WebSockets
elasticsql - convert sql to elasticsearch DSL in golang(go)
k8s-mediaserver-operator - Repository for k8s Mediaserver Operator project
goes
sqlcipher - SQLCipher is a standalone fork of SQLite that adds 256 bit AES encryption of database files and other security features.
elastigo - A Go (golang) based Elasticsearch client library.
litefs - FUSE-based file system for replicating SQLite databases across a cluster of machines