vector vs tantivy

Our great sponsors

WorkOS - The modern identity platform for B2B SaaS

InfluxDB - Power Real-Time Data Analytics at Scale

SaaSHub - Software Alternatives and Reviews

Our great sponsors

vector		tantivy
	Project
95	Mentions	48
16,366	Stars	9,803
4.8%	Growth	2.9%
9.9	Activity	9.1
5 days ago	Latest Commit	6 days ago
Rust	Language	Rust
Mozilla Public License 2.0	License	MIT License

The number of mentions indicates the total number of mentions that we've tracked plus the number of user suggested alternatives.
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.

vector

Posts with mentions or reviews of vector. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2024-03-18.

FLaNK AI Weekly 18 March 2024
39 projects | dev.to | 18 Mar 2024
Vector: A high-performance observability data pipeline
5 projects | news.ycombinator.com | 17 Mar 2024

Datadog bought Timber Technologies (creators of Vector) two years ago. https://www.datadoghq.com/blog/datadog-acquires-timber-techn...
Timber definitely intended to just rock out & demolish everything else out there with their agent/forwarder/aggregator tech. But it wasn't a competitive play against OTel, in my humble opinion. Timber's whole shtick is that it integrates with everything, with really flexible/good glue logic in-between. A competent multi-system (logging, metrics, eventually traces) fluentd++. OTel - I want to believe - would have been part of that original vision.
It's just taking a really really long time. One can speculate how direction & velocity might have changed since the Datadog acquisition. The lack of tracing (anywhere except Datadog, so far) materializing has been a hard hard hard & sad thing to see. OG https://github.com/vectordotdev/vector/issues/1444 and newer https://github.com/vectordotdev/vector/issues/17307

5 projects | news.ycombinator.com | 17 Mar 2024

Vector is fantastic software. Currently running a multi-GB/s log pipeline with it. Vector agents as DaemonSets collecting pod and journald logs then forwarding w/ vector's protobuf protocol to a central vector aggregator Deployment with various sinks - s3, gcs/bigquery, loki, prom.
The documentation is great but it can be hard to find examples of common patterns, although it's getting better with time and a growing audience.
My pro-tip has been to prefix your searches with "vector dev A recent contribution added an alternative to prometheus pushgateway that handles counters better: https://github.com/vectordotdev/vector/issues/10304#issuecom...

5 projects | news.ycombinator.com | 17 Mar 2024
About reading logs
2 projects | /r/sysadmin | 28 Sep 2023

We don't pull logs, we forward logs to a centralized logging service.
Self hosted log paraer
4 projects | /r/selfhosted | 20 Jun 2023

opensearch - amazon fork of Elasticsearch https://opensearch.org/docs/latestif you do this an have distributed log sources you'd use logstash for, bin off logstash and use vector (https://vector.dev/) its better out of the box for SaaS stuff.
Show HN: Homelab Monitoring Setup with Grafana
6 projects | news.ycombinator.com | 7 Jun 2023

I think there's nothing currently that combines both logging and metrics into one easy package and visualizes it, but it's also something I would love to have.
Vector[1] would work as the agent, being able to collect both logs and metrics. But the issue would then be storing it. I'm assuming the Elastic Stack might now be able to do both, but it's just to heavy to deal with in a small setup.
A couple of months ago I took a brief look at that when setting up logging for my own homelab (https://pv.wtf/posts/logging-and-the-homelab). Mostly looking at the memory usage to fit it on my synology. Quickwit[2] and Log-Store[3] both come with built in web interfaces that reduce the need for grafana, but neither of them do metrics.
- [1] https://vector.dev
Lightweight logging on RPi?
4 projects | /r/selfhosted | 24 May 2023

I would recommend that you run vector as a systems service so you don't have to worry about managing it. Here is a basic config to do that - https://github.com/vectordotdev/vector/blob/master/distribution/systemd/vector.service .
Monitoring traefik access logs easily
2 projects | /r/selfhosted | 8 May 2023

You could have a look at Grafana Loki, it's easy to run (single binary for a small setup). Shipping your logs can be done by Promtail or something like Vector. They're both lightweight log shippers with support for Loki.
Ask HN: How to build an image search service?
2 projects | news.ycombinator.com | 1 Feb 2023

tantivy

Posts with mentions or reviews of tantivy. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2024-03-22.

SeekStorm VS tantivy - a user suggested alternative
2 projects | 22 Mar 2024
What is Hybrid Search?
6 projects | dev.to | 6 Feb 2024

Tantivy - a full-text indexing library written in Rust. Has a great performance and featureset.
RAG Using Unstructured Data and Role of Knowledge Graphs
4 projects | news.ycombinator.com | 17 Jan 2024

By this I presume you mean build a search index that can retrieve results based on keywords? I know certain databases use Lucene to build a keyword-based index on top of unstructured blobs of data. Another alternative is to use Tantivy (https://github.com/quickwit-oss/tantivy), a Rust version of Lucene, if building search indices via Java isn't your cup of tea :)
Both libraries offer multilingual support for keywords, I believe, so that's a benefit to vector search where multilingual embedding models are rather expensive.
Show HN: Quickwit – OSS Alternative to Elasticsearch, Splunk, Datadog
4 projects | news.ycombinator.com | 7 Jan 2024

We also implemented our schemaless columnar storage optimized for object storage.
The inverted index and columnar storage are part of tantivy [0], which is the fastest search library out there. We maintain it and we decided to build the distributed engine on top of it.
[0] tantivy github repo: https://github.com/quickwit-oss/tantivy
Pg_bm25: Elastic-Quality Full Text Search Inside Postgres
6 projects | news.ycombinator.com | 8 Oct 2023

The issue for geo search is here: https://github.com/quickwit-oss/tantivy/issues/44
Grimoire - A recipe management application.
7 projects | /r/rust | 5 Oct 2023

Search index : Custom-built using tantivy.
A Compressed Indexable Bitset
6 projects | news.ycombinator.com | 1 Jul 2023

The roaring bitmap variant is used only for the optional index (1 docid => 0 or 1 value) in the columnar storage (DocValues), not for the inverted index. Since this is used for aggregation, some queries may be a full scan.
The inverted index in tantivy uses bitpacked values of 128 elements with a skip index on top.
> I didn't follow the rest of your comment, select is what EF is good at, every other data structure needs a lot more scanning once you land on the right chunk. With BMI2 you can also use the PDEP instruction to accelerate the final select on a 64-bit block
The select for the sparse codec is a [simple array index access](https://github.com/quickwit-oss/tantivy/blob/main/columnar/s...), that is hard to beat. Compression is not good near the 5k threshold though.
Job: Rust + Retrieval Systems at Etsy
2 projects | /r/rust | 23 Jun 2023

Hi /r/rust, I’m a SWE on Etsy’s Retrieval Systems team where we’re building a platform based on rust and tantivy (https://github.com/quickwit-oss/tantivy). We’re looking to bring two new engineers onto the team.
Announcing Velo - Your Rust-Powered Brainstorming and Note-Taking Tool
4 projects | /r/rust | 19 Jun 2023

Quick Search: Easily find specific notes with Velo's fuzzy-search feature, powered by tantivy. tantivy might have been a little overkill, but it was really easy to integrate.
Quickwit 0.6.0 - Search and analytics on billions of logs with minimal hardware
4 projects | /r/selfhosted | 9 Jun 2023

Two years after, we are finally reaching a version that can deliver our promise. Two years is both very long for a startup and very short when building a distributed engine. And we decided to do it the hard way: we implemented our own OSS gossip library, our own {S3,JSON}-friendly columnar format for schemaless analytics, and of course, we maintain our own search library, tantivy. This is a lot of engineering investment and obviously, it takes some time to finally reach the end users.

What are some alternatives?

When comparing vector and tantivy you can also consider the following projects:

graylog - Free and open log management

Fluentd - Fluentd: Unified Logging Layer (project under CNCF)

agent - Vendor-neutral programmable observability pipelines.

syslog-ng - syslog-ng is an enhanced log daemon, supporting a wide range of input and output methods: syslog, unstructured text, queueing, SQL & NoSQL.

OpenSearch - 🔎 Open source distributed and RESTful search engine.

sonic - 🦔 Fast, lightweight & schema-less search backend. An alternative to Elasticsearch that runs on a few MBs of RAM.

tracing - Application level tracing for Rust.

qryn - qryn is a polyglot, high-performance observability framework for ClickHouse. Ingest, store and analyze logs, metrics and telemetry traces from any agent supporting Loki, Prometheus, OTLP, Tempo, Elastic, InfluxDB and many more formats and query transparently using Grafana or any other compatible client.

thanos - Highly available Prometheus setup with long term storage capabilities. A CNCF Incubating project.

surrealdb - A scalable, distributed, collaborative, document-graph database, for the realtime web

milli - Search engine library for Meilisearch ⚡️

opensearch - OpenSearch is a collection of simple formats for the sharing of search results.

vector vs graylog vector vs Fluentd vector vs agent vector vs syslog-ng vector vs OpenSearch tantivy vs sonic vector vs tracing vector vs qryn vector vs thanos tantivy vs surrealdb tantivy vs milli vector vs opensearch

Compare vector vs tantivy and see what are their differences.

vector

tantivy

vector

tantivy

What are some alternatives?