Embeddings Are Underrated

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com

SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com
featured
  1. txtai

    💡 All-in-one open-source embeddings database for semantic search, LLM orchestration and language model workflows

    Author of txtai (https://github.com/neuml/txtai) here. I've been in the embeddings space since 2020 before the world of LLMs/GenAI.

    In principle, I agree with much of the sentiment here. Embeddings can get you pretty far. If the goal is to find information and citations/links, you can accomplish most of that with a simple embeddings/vector search.

    GenAI does have an upside in that it can distill and process those results into something more refined. One of the main production use cases is retrieval augmented generation (RAG). The "R" is usually a vector search but doesn't have to be.

    As we see with things like ChatGPT search and Perplexity, there is a push towards using LLMs to summarize the results but also linking to the results to increase user confidence. Even Google Search now has that GenAI section at the top. In general, users just aren't going to accept LLM responses without source citations at this point. The question is if the summary provides value or if the citations really provide the most value. If it's the later, then Embeddings will get the job done.

  2. SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

    SaaSHub logo
  3. git-semantic-similarity

    Search git commit messages by semantic similarity with embeddings from sentence-transformers

  4. SONAR

    SONAR, a new multilingual and multimodal fixed-size sentence embedding space, with a full suite of speech and text encoders and decoders. (by facebookresearch)

    Reconstruct text from SONAR embeddings: https://github.com/facebookresearch/SONAR?tab=readme-ov-file...

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts

  • Analyzing LinkedIn Company Posts with Graphs and Agents

    1 project | dev.to | 12 Jan 2025
  • Show HN: Open-Source Colab Notebooks to Implement Advanced RAG Techniques

    6 projects | news.ycombinator.com | 3 Dec 2024
  • Voice Activity Detection in Elixir with Membran

    1 project | news.ycombinator.com | 4 Dec 2024
  • txtai 8.0 released: an agent framework for minimalists

    1 project | news.ycombinator.com | 18 Nov 2024
  • NucliaDB, the AI Search Database for RAG

    1 project | news.ycombinator.com | 1 Nov 2024

Did you know that Python is
the 2nd most popular programming language
based on number of references?