90x Faster Than Pgvector – Lantern's HNSW Index Creation Time

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com

Our great sponsors
  • InfluxDB - Power Real-Time Data Analytics at Scale
  • WorkOS - The modern identity platform for B2B SaaS
  • SaaSHub - Software Alternatives and Reviews
  • lantern

    PostgreSQL vector database extension for building AI applications

  • This extension is licensed under the Business Source License[0], which makes it incompatible with most DBaaS offerings. The BSL is a closed-source license. Good choice for Lantern, but unusable for everyone else.

    Some Postgres offerings allow you to bring your own extensions, for instance Neon[1], where I work. I tried to look at AWS docs for you, but couldn't find anything about that. I did find Trusted Language Extensions[2], but that seems to be more about writing your own extension. Couldn't find a way to upload arbitrary extensions.

    [0]: https://github.com/lanterndata/lantern/commit/dda7f064ca80af...

  • marqo

    Unified embedding generation and search engine. Also available on cloud - cloud.marqo.ai

  • That sounds much longer than it should. I am not sure on your exact use-case but I would encourage you to check out Marqo (https://github.com/marqo-ai/marqo - disclaimer, I am a co-founder). All inference and orchestration is included (no api calls) and many open-source or fine-tuned models can be used.

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • jvector

    JVector: the most advanced embedded vector search engine

  • Nice to see people care about index construction time.

    JVector scales linearly to at least 32 cores and may be the only graph-based vector index designed around nonblocking data structures (as opposed to fine-grained locks): https://github.com/jbellis/jvector/

    JVector indexes the Sift1M dataset in under 19s on a 32 core aws box (m6i.16xl), compared to 50s for Lantern in the article.

  • usearch

    Fast Open-Source Search & Clustering engine × for Vectors & 🔜 Strings × in C++, C, Python, JavaScript, Rust, Java, Objective-C, Swift, C#, GoLang, and Wolfram 🔍

  • pgrx

    Build Postgres Extensions with Rust!

  • (disclosure, i work at supabase and have been developing TLEs with the RDS team)

    Trusted Language Extensions refer to an extension written in any trusted language. In this case Rust, but it also includes: plpgsql, plv8, etc. See [0]

    > PL/Rust is a more performant and more feature-rich alternative to PL/pgSQL

    This is only partially true. plpgsql has bindings to low-level Postgres APIs, so in some cases it is just as fast (or faster) than Rust.

    > Building a vector index (or any index for that matter) inside Postgres is a more involved process and can not be done via the UDF interface, be it Rust, C or PL/pgSQL

    Most PG Rust extensions are written with the excellent pgrx framework [1]. While it doesn't have index bindings right now, I can certainly imagine a future where this is possible[2].

    All that said - I think there are a lot of hoops to jump through right now and I doubt it's worth it for the Latern team. I think they are right to focus on developing a separate C extension

    [0] TLE: https://supabase.com/blog/pg-tle

    [1] pgrx: https://github.com/pgcentralfoundation/pgrx

    [2] https://github.com/pgcentralfoundation/pgrx/issues/190#issue...

  • WorkOS

    The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.

    WorkOS logo
NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts