Embeddings are a good starting point for the AI curious app developer

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com

Our great sponsors
  • WorkOS - The modern identity platform for B2B SaaS
  • InfluxDB - Power Real-Time Data Analytics at Scale
  • SaaSHub - Software Alternatives and Reviews
  • sqlite-vss

    A SQLite extension for efficient vector search, based on Faiss!

  • Re storing vectors in BLOB columns: ya, if it's not a lot of data and it's fast enough for you, then there's no problem doing it like that. I'd even just store then in JSON/npy files first and see how long you can get away with it. Once that gets too slow, then try SQLite/redis/valkey, and when that gets too slow, look into pgvector or other vector database solutions.

    For SQLite specifically, very large BLOB columns might effect query performance, especially for large embeddings. For example, a 1536-dimension vector from OpenAI would take 1536 * 4 = 6144 bytes of space, if stored in a compact BLOB format. That's larger than SQLite default page size of 4096, so that extra data will overflow into overflow pages. Which again, isn't too big of a deal, but if the original table had small values before, then table scans can be slower.

    One solution is to move it to a separate table, ex on an original `users` table, you can make a new `CREATE TABLE users_embeddings(user_id, embedding)` table and just LEFT JOIN that when you need it. Or you can use new techniques like Matryoshka embeddings[0] or scalar/binary quantization[1] to reduce the size of individual vectors, at the cost of lower accuracy. Or you can bump the page size of your SQLite database with `PRAGMA page_size=8192`.

    I also have a SQLite extension for vector search[2], but there's a number of usability/ergonomic issues with it. I'm making a new one that I hope to release soon, which will hopefully be a great middle ground between "store vectors in a .npy files" and "use pgvector".

    Re "do embeddings ever expire": nope! As long as you have access to the same model, the same text input should give the same embedding output. It's not like LLMs that have temperatures/meta prompts/a million other dials that make outputs non-deterministic, most embedding models should be deterministic and should work forever.

    [0] https://huggingface.co/blog/matryoshka

    [1] https://huggingface.co/blog/embedding-quantization

    [2] https://github.com/asg017/sqlite-vss

  • fastembed-rs

    Library to generate vector embeddings in Rust

  • Yes, I use fastembed-rs[1] in a project I'm working on and it runs flawlessly. You can store the embeddings in any boring database, but for fast vector math, a vector database is recommended (e.g. the pgvector postgres extension).

    [1] https://github.com/Anush008/fastembed-rs

  • WorkOS

    The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.

    WorkOS logo
  • lantern_extras

    Routines for generating, manipulating, parsing, importing vector embeddings into Postgres tables

  • We provide this functionality in Lantern cloud via our Lantern Extras extension: <https://github.com/lanterndata/lantern_extras>

    You can generate CLIP embeddings locally on the DB server via:

      SELECT abstract,

  • candle_embed

    A simple, CUDA or CPU powered, library for creating vector embeddings using Candle and models from Hugging Face

  • Fun timing!

    I literally just published my first crate: candle_embed[1]

    It uses Candle under the hood (the crate is more of a user friendly wrapper) and lets you use any model on HF like the new SoTA model from Snowflake[2].

    [1] https://github.com/ShelbyJenkins/candle_embed

  • llama.cpp

    LLM inference in C/C++

  • Have just done this recently for local chat with pdf feature in https://recurse.chat. (It's a macOS app that has built-in llama.cpp server and local vector database)

    Running an embedding server locally is pretty straightforward:

    - Get llama.cpp release binary: https://github.com/ggerganov/llama.cpp/releases

  • pg_vectorize

    The simplest way to orchestrate vector search on Postgres

  • check out https://github.com/tembo-io/pg_vectorize - we're taking it a little bit beyond just the storage and index. The project uses pgvector for the indices and distance operators, but also adds a simpler API, hooks into pre-trained embedding models, and helps you keep embeddings updated as data changes/grows

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts