Reality check on good embedding model (and this idea in general)

This page summarizes the projects mentioned and recommended in the original post on /r/LocalLLaMA

Our great sponsors
  • InfluxDB - Power Real-Time Data Analytics at Scale
  • WorkOS - The modern identity platform for B2B SaaS
  • SaaSHub - Software Alternatives and Reviews
  • Delphic

    Discontinued Starter App to Build Your Own App to Query Doc Collections with Large Language Models (LLMs) using LlamaIndex, Langchain, OpenAI and more (MIT Licensed)

  • Hi - I'm working on getting up to speed to put together a practical implementation. As a proof-of-concept I'm trying to build a locally-hosted (no external API calls) document query proof-of-concept along the lines of Delphic ( GitHub - JSv4/Delphic: Starter App to Build Your Own App to Query Doc Collections with Large Language Models (LLMs) using LlamaIndex, Langchain, OpenAI and more (MIT Licensed) ) As I type this, I realize it would probably be enough to just demonstrate something working in a Jupyter notebook.

    Probably. But there are a number of free open source ones. For example, I've got a document that I'm doing embedding-keys for that has about 8000 sentences. Here's a list of some [ https://github.com/currentslab/awesome-vector-search ]

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • bert.cpp

    ggml implementation of BERT

  • There is bert.cpp, which aims to implement the sentence-transormers models. https://github.com/skeskinen/bert.cpp

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts