awesome-vector-search vs db-benchmark

awesome-vector-search

Collections of vector search related libraries, service and research papers (by currentslab)

Source Code

Suggest alternative

Edit details

db-benchmark

reproducible benchmark of database-like ops (by h2oai)

Suggest topics

Source Code

h2oai.github.io

Suggest alternative

Edit details

InfluxDB - Power Real-Time Data Analytics at Scale

Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

www.influxdata.com

featured

SaaSHub - Software Alternatives and Reviews

SaaSHub helps you find the best software and product alternatives

www.saashub.com

featured

awesome-vector-search		db-benchmark
	Project
20	Mentions	91
1,275	Stars	320
2.5%	Growth	0.0%
6.1	Activity	0.0
23 days ago	Latest Commit	10 months ago
	Language	R
MIT License	License	Mozilla Public License 2.0

The number of mentions indicates the total number of mentions that we've tracked plus the number of user suggested alternatives.
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.

awesome-vector-search

Posts with mentions or reviews of awesome-vector-search. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2023-10-07.

Show HN: SimSIMD vs. SciPy: How AVX-512 and SVE make SIMD cleaner and ML faster
16 projects | news.ycombinator.com | 7 Oct 2023
Reality check on good embedding model (and this idea in general)
3 projects | /r/LocalLLaMA | 7 May 2023

Probably. But there are a number of free open source ones. For example, I've got a document that I'm doing embedding-keys for that has about 8000 sentences. Here's a list of some [ https://github.com/currentslab/awesome-vector-search ]
Rye, meet GPT3 ... and vice versa :)
6 projects | /r/ryelang | 4 Apr 2023

note: search for vector databases not written in Go but with Go clients, in case there is anything more local/lightweight: https://github.com/currentslab/awesome-vector-search
Vector database built for scalable similarity search
19 projects | news.ycombinator.com | 25 Mar 2023

https://github.com/currentslab/awesome-vector-search
I was surprised to see Elastic actually has ok support for some of this stuff, though it appears slower for most of the tasks.
[P] My co-founder and I quit our engineering jobs at AWS to build “Tensor Search”. Here is why.
6 projects | /r/MachineLearning | 21 Sep 2022

Supporting sequence of vectors does seems like a fresh air to the vector search service. I have added marqo to the list of awesome vector search (disclosure: I am the maintainer of the list) to increase your exposure.
What are vector search engines?
2 projects | dev.to | 1 May 2022

If you want a proper curated list of various libraries and standalone services of vector search engines, refer to this awesome GitHub repository by Currents API.
List of vector search libraries
1 project | /r/CKsTechNews | 3 Apr 2022
List of curated vector search libraries
1 project | news.ycombinator.com | 2 Apr 2022
A GitHub repository that collects awesome vector search framework/engine, library, cloud service, and research papers
1 project | /r/vectordatabase | 27 Dec 2021
Find anything fast with Google's vector search technology
4 projects | /r/hackernews | 14 Dec 2021

db-benchmark

Posts with mentions or reviews of db-benchmark. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2024-01-08.

Database-Like Ops Benchmark
1 project | news.ycombinator.com | 28 Jan 2024
Polars
11 projects | news.ycombinator.com | 8 Jan 2024

Real-world performance is complicated since data science covers a lot of use cases.
If you're just reading a small CSV to do analysis on it, then there will be no human-perceptible difference between Polars and Pandas. If you're reading a larger CSV with 100k rows, there still won't be much of a perceptible difference.
Per this (old) benchmark, there are differences once you get into 500MB+ territory: https://h2oai.github.io/db-benchmark/
DuckDB performance improvements with the latest release
8 projects | news.ycombinator.com | 6 Nov 2023

I do think it was important for duckdb to put out a new version of the results as the earlier version of that benchmark [1] went dormant with a very old version of duckdb with very bad performance, especially against polars.
[1] https://h2oai.github.io/db-benchmark/
Show HN: SimSIMD vs. SciPy: How AVX-512 and SVE make SIMD cleaner and ML faster
16 projects | news.ycombinator.com | 7 Oct 2023

https://news.ycombinator.com/item?id=33270638 :
> Apache Ballista and Polars do Apache Arrow and SIMD.
> The Polars homepage links to the "Database-like ops benchmark" of {Polars, data.table, DataFrames.jl, ClickHouse, cuDF, spark, (py)datatable, dplyr, pandas, dask, Arrow, DuckDB, Modin,} but not yet PostgresML? https://h2oai.github.io/db-benchmark/ *
LLM -> Vector database: https://en.wikipedia.org/wiki/Vector_database
/? inurl:awesome site:github.com "vector database"
Pandas vs. Julia – cheat sheet and comparison
7 projects | news.ycombinator.com | 17 May 2023

I agree with your conclusion but want to add that switching from Julia may not make sense either.
According to these benchmarks: https://h2oai.github.io/db-benchmark/, DF.jl is the fastest library for some things, data.table for others, polars for others. Which is fastest depends on the query and whether it takes advantage of the features/properties of each.
For what it's worth, data.table is my favourite to use and I believe it has the nicest ergonomics of the three I spoke about.
Any faster Python alternatives?
6 projects | /r/learnprogramming | 12 Apr 2023

Same. Numba does wonders for me in most scenarios. Yesterday I've discovered pola-rs and looks like I will add it to the stack. It's API is similar to pandas. Have a look at the benchmarks of cuDF, spark, dask, pandas compared to it: Benchmarks
Pandas 2.0 (with pyarrow) vs Pandas 1.3 - Performance comparison
1 project | /r/datascience | 8 Apr 2023

The syntax has similarities with dplyr in terms of the way you chain operations, and it’s around an order of magnitude faster than pandas and dplyr (there’s a nice benchmark here). It’s also more memory-efficient and can handle larger-than-memory datasets via streaming if needed.
Pandas v2.0 Released
5 projects | news.ycombinator.com | 3 Apr 2023

If interested in benchmarks comparing different dataframe implementations, here is one:
https://h2oai.github.io/db-benchmark/
Database-like ops benchmark
1 project | /r/dataengineering | 16 Feb 2023
Python "programmers" when I show them how much faster their naive code runs when translated to C++ (this is a joke, I love python)
2 projects | /r/ProgrammerHumor | 17 Jan 2023

Bad examples. Both numpy and pandas are notoriously un-optimized packages, losing handily to pretty much all their competitors (R, Julia, kdb+, vaex, polars). See https://h2oai.github.io/db-benchmark/ for a partial comparison.

What are some alternatives?

When comparing awesome-vector-search and db-benchmark you can also consider the following projects:

pgvector - Open-source vector similarity search for Postgres

polars - Dataframes powered by a multithreaded, vectorized query engine, written in Rust

annoy - Approximate Nearest Neighbors in C++/Python optimized for memory usage and loading/saving to disk

datafusion - Apache DataFusion SQL Query Engine

qdrant - Qdrant - High-performance, massive-scale Vector Database for the next generation of AI. Also available in the cloud https://cloud.qdrant.io/

Apache Arrow - Apache Arrow is a multi-language toolbox for accelerated data interchange and in-memory processing

Milvus - A cloud-native vector database, storage for next generation AI applications

databend - 𝗗𝗮𝘁𝗮, 𝗔𝗻𝗮𝗹𝘆𝘁𝗶𝗰𝘀 & 𝗔𝗜. Modern alternative to Snowflake. Cost-effective and simple for massive-scale analytics. https://databend.com

hnswlib - Header-only C++/python library for fast approximate nearest neighbors

sktime - A unified framework for machine learning with time series

featureform - The Virtual Feature Store. Turn your existing data infrastructure into a feature store.

DataFramesMeta.jl - Metaprogramming tools for DataFrames

awesome-vector-search vs pgvector db-benchmark vs polars awesome-vector-search vs annoy db-benchmark vs datafusion awesome-vector-search vs qdrant db-benchmark vs Apache Arrow awesome-vector-search vs Milvus db-benchmark vs databend awesome-vector-search vs hnswlib db-benchmark vs sktime awesome-vector-search vs featureform db-benchmark vs DataFramesMeta.jl

Compare awesome-vector-search vs db-benchmark and see what are their differences.

awesome-vector-search

db-benchmark

awesome-vector-search

db-benchmark

What are some alternatives?