VectorDB: Vector Database Built by Kagi Search

Our great sponsors

WorkOS - The modern identity platform for B2B SaaS

InfluxDB - Power Real-Time Data Analytics at Scale

SaaSHub - Software Alternatives and Reviews

Our great sponsors

vectordb

6 543 7.6 Python

A minimal Python package for storing and retrieving text using chunking, embeddings, and vector search. (by kagisearch)

https://github.com/kagisearch/vectordb/blob/453bb658bb710838...
Looks like it uses one of these, depending on your settings:
Fast model: google/universal-sentence-encoder/4
Multilingual model: universal-sentence-encoder-multilingual-large/3
Normal model (Alternative): BAAI/bge-small-en-v1.5
Best model: BAAI/bge-base-en-v1.5

txtai

355 6,990 9.3 Python

💡 All-in-one open-source embeddings database for semantic search, LLM orchestration and language model workflows

I've seen a number of projects come over the last couple years. I'm the author of txtai (https://github.com/neuml/txtai) which I started in 2020. How you approach performance is the key point.
You can write performant code in any language. For example, for standard keyword search, I wrote a component to make sparse/keyword search just as efficient as Apache Lucene in Python. https://neuml.hashnode.dev/building-an-efficient-sparse-keyw....

WorkOS

workos.com sponsored

The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
lancedb-study

1 11 8.9 Python

Benchmark study on LanceDB, an embedded vector DB, for full-text search and vector search

I thought the API here was quite neat. It's fairly simple to implement a lancedb backend for it instead of sklearn/faiss/mrpt as the source code is really simple.
This repo is basically just a nice api and the needed chunking and batching logic. Using lancedb, you'd still have to write that, as exemplified here: https://github.com/prrao87/lancedb-study/blob/main/lancedb/i...

onnxruntime

54 12,656 10.0 C++

ONNX Runtime: cross-platform, high performance ML inferencing and training accelerator

What about models besides GPT? Most of the popular vector encoding models aren't using this architecture.
If you really didn't want PyTorch/Transformers, you could consider exporting your models to ONNX (https://github.com/microsoft/onnxruntime).

Wallabag

64 9,702 9.8 PHP

wallabag is a self hostable application for saving web pages: Save and classify articles. Read them later. Freely.

https://github.com/wallabag/wallabag
No one has mentioned wallabag yet, so wanted to. Been working well for me - has apps and extensions. If you’re not excited to self-host - https://www.wallabag.it/en has been flawless with the exorbitant price of… 11 euro a year.

InfluxDB

www.influxdata.com sponsored

Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

FLaNK Stack Weekly 19 Feb 2024
50 projects | dev.to | 19 Feb 2024
🐍🐍 23 issues to grow yourself as an exceptional open-source Python expert 🧑‍💻 🥇
10 projects | dev.to | 19 Oct 2023
Julia 1.9 Highlights
9 projects | news.ycombinator.com | 10 May 2023
Python projects with best practices on Github?
23 projects | /r/Python | 14 Feb 2023
3% of 666 Python codebases we checked had a silently failing unit test
20 projects | /r/Python | 15 Feb 2022

VectorDB: Vector Database Built by Kagi Search

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com
Machine Learning Deep Learning HacktoberFest Read it Later Lists Python
Post date: 26 Nov 2023

vectordb

txtai

WorkOS

lancedb-study

onnxruntime

Wallabag

InfluxDB

Related posts

VectorDB: Vector Database Built by Kagi Search

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com Machine Learning Deep Learning HacktoberFest Read it Later Lists Python Post date: 26 Nov 2023

vectordb

txtai

WorkOS

lancedb-study

onnxruntime

Wallabag

InfluxDB

Related posts

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com
Machine Learning Deep Learning HacktoberFest Read it Later Lists Python
Post date: 26 Nov 2023