-
txtai
💡 All-in-one open-source embeddings database for semantic search, LLM orchestration and language model workflows
Author of txtai (https://github.com/neuml/txtai) here. I've been in the embeddings space since 2020 before the world of LLMs/GenAI.
In principle, I agree with much of the sentiment here. Embeddings can get you pretty far. If the goal is to find information and citations/links, you can accomplish most of that with a simple embeddings/vector search.
GenAI does have an upside in that it can distill and process those results into something more refined. One of the main production use cases is retrieval augmented generation (RAG). The "R" is usually a vector search but doesn't have to be.
As we see with things like ChatGPT search and Perplexity, there is a push towards using LLMs to summarize the results but also linking to the results to increase user confidence. Even Google Search now has that GenAI section at the top. In general, users just aren't going to accept LLM responses without source citations at this point. The question is if the summary provides value or if the citations really provide the most value. If it's the later, then Embeddings will get the job done.
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
-
git-semantic-similarity
Search git commit messages by semantic similarity with embeddings from sentence-transformers
-
SONAR
SONAR, a new multilingual and multimodal fixed-size sentence embedding space, with a full suite of speech and text encoders and decoders. (by facebookresearch)
Reconstruct text from SONAR embeddings: https://github.com/facebookresearch/SONAR?tab=readme-ov-file...