VectorDB: Vector Database Built by Kagi Search

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com

Our great sponsors
  • WorkOS - The modern identity platform for B2B SaaS
  • InfluxDB - Power Real-Time Data Analytics at Scale
  • SaaSHub - Software Alternatives and Reviews
  • vectordb

    A minimal Python package for storing and retrieving text using chunking, embeddings, and vector search. (by kagisearch)

  • https://github.com/kagisearch/vectordb/blob/453bb658bb710838...

    Looks like it uses one of these, depending on your settings:

    Fast model: google/universal-sentence-encoder/4

    Multilingual model: universal-sentence-encoder-multilingual-large/3

    Normal model (Alternative): BAAI/bge-small-en-v1.5

    Best model: BAAI/bge-base-en-v1.5

  • txtai

    πŸ’‘ All-in-one open-source embeddings database for semantic search, LLM orchestration and language model workflows

  • I've seen a number of projects come over the last couple years. I'm the author of txtai (https://github.com/neuml/txtai) which I started in 2020. How you approach performance is the key point.

    You can write performant code in any language. For example, for standard keyword search, I wrote a component to make sparse/keyword search just as efficient as Apache Lucene in Python. https://neuml.hashnode.dev/building-an-efficient-sparse-keyw....

  • WorkOS

    The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.

    WorkOS logo
  • lancedb-study

    Benchmark study on LanceDB, an embedded vector DB, for full-text search and vector search

  • I thought the API here was quite neat. It's fairly simple to implement a lancedb backend for it instead of sklearn/faiss/mrpt as the source code is really simple.

    This repo is basically just a nice api and the needed chunking and batching logic. Using lancedb, you'd still have to write that, as exemplified here: https://github.com/prrao87/lancedb-study/blob/main/lancedb/i...

  • onnxruntime

    ONNX Runtime: cross-platform, high performance ML inferencing and training accelerator

  • What about models besides GPT? Most of the popular vector encoding models aren't using this architecture.

    If you really didn't want PyTorch/Transformers, you could consider exporting your models to ONNX (https://github.com/microsoft/onnxruntime).

  • Wallabag

    wallabag is a self hostable application for saving web pages: Save and classify articles. Read them later. Freely.

  • https://github.com/wallabag/wallabag

    No one has mentioned wallabag yet, so wanted to. Been working well for me - has apps and extensions. If you’re not excited to self-host - https://www.wallabag.it/en has been flawless with the exorbitant price of… 11 euro a year.

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts