semantic-search

Open-source projects categorized as semantic-search

Top 23 semantic-search Open-Source Projects

  • MindsDB

    The platform for customizing AI from enterprise data

  • Project mention: What’s the Difference Between Fine-tuning, Retraining, and RAG? | dev.to | 2024-04-08

    Check us out on GitHub.

  • Typesense

    Open Source alternative to Algolia + Pinecone and an Easier-to-Use alternative to ElasticSearch ⚡ 🔍 ✨ Fast, typo tolerant, in-memory fuzzy Search Engine for building delightful search experiences

  • Project mention: Website Search Hurts My Feelings | news.ycombinator.com | 2023-12-26

    There are actually plenty of non-ES products that are way easier to integrate and tune (and get better results with less effort).

    - Typesense (https://github.com/typesense/typesense)

    - Algolia

    - Google Programmable Search Engine (https://programmablesearchengine.google.com/about/)

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • haystack

    :mag: LLM orchestration framework to build customizable, production-ready LLM applications. Connect components (models, vector DBs, file converters) to pipelines or agents that can interact with your data. With advanced retrieval methods, it's best suited for building RAG, question answering, semantic search or conversational agent chatbots.

  • Project mention: Release Radar • March 2024 Edition | dev.to | 2024-04-07

    View on GitHub

  • Weaviate

    Weaviate is an open-source vector database that stores both objects and vectors, allowing for the combination of vector search with structured filtering with the fault tolerance and scalability of a cloud-native database​.

  • Project mention: pgvecto.rs alternatives - qdrant and Weaviate | libhunt.com/r/pgvecto.rs | 2024-03-13
  • txtai

    💡 All-in-one open-source embeddings database for semantic search, LLM orchestration and language model workflows

  • Project mention: Build knowledge graphs with LLM-driven entity extraction | dev.to | 2024-02-21

    txtai is an all-in-one embeddings database for semantic search, LLM orchestration and language model workflows.

  • GPTCache

    Semantic cache for LLMs. Fully integrated with LangChain and llama_index.

  • Project mention: Ask HN: What are the drawbacks of caching LLM responses? | news.ycombinator.com | 2024-03-15

    Just found this: https://github.com/zilliztech/GPTCache which seems to address this idea/issue.

  • khoj

    Your AI second brain. A copilot to get answers to your questions, whether they be from your own notes or from the internet. Use powerful, online (e.g gpt4) or private, local (e.g mistral) LLMs. Self-host locally or use our web app. Access from Obsidian, Emacs, Desktop app, Web or Whatsapp.

  • Project mention: Show HN: I made an app to use local AI as daily driver | news.ycombinator.com | 2024-02-27

    There are already several RAG chat open source solutions available. Two that immediately come to mind are:

    Danswer

    https://github.com/danswer-ai/danswer

    Khoj

    https://github.com/khoj-ai/khoj

  • WorkOS

    The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.

    WorkOS logo
  • marqo

    Unified embedding generation and search engine. Also available on cloud - cloud.marqo.ai

  • Project mention: Are we at peak vector database? | news.ycombinator.com | 2024-01-25

    We (Marqo) are doing a lot on 1 and 2. There is a huge amount to be done on the ML side of vector search and we are investing heavily in it. I think it has not quite sunk in that vector search systems are ML systems and everything that comes with that. I would love to chat about 1 and 2 so feel free to email me (email is in my profile). What we have done so far is here -> https://github.com/marqo-ai/marqo

  • llmware

    Providing enterprise-grade LLM-based development framework, tools, and fine-tuned models.

  • Project mention: More Agents Is All You Need: LLMs performance scales with the number of agents | news.ycombinator.com | 2024-04-06

    I couldn't agree more. You should check out LLMWare's SLIM agents (https://github.com/llmware-ai/llmware/tree/main/examples/SLI...). It's focusing on pretty much exactly this and chaining multiple local LLMs together.

    A really good topic that ties in with this is the need for deterministic sampling (I may have the terminology a bit incorrect) depending on what the model is indended for. The LLMWare team did a good 2 part video on this here as well (https://www.youtube.com/watch?v=7oMTGhSKuNY)

    I think dedicated miniture LLMs are the way forward.

    Disclaimer - Not affiliated with them in any way, just think it's a really cool project.

  • databerry

    The no-code platform for building custom LLM Agents

  • Project mention: Open-source platform to build custom ChatGPT Agents | /r/reactjs | 2023-06-17
  • Top2Vec

    Top2Vec learns jointly embedded topic, document and word vectors.

  • Project mention: [D] Is it better to create a different set of Doc2Vec embeddings for each group in my dataset, rather than generating embeddings for the entire dataset? | /r/MachineLearning | 2023-10-28

    I'm using Top2Vec with Doc2Vec embeddings to find topics in a dataset of ~4000 social media posts. This dataset has three groups:

  • docarray

    Represent, send, store and search multimodal data

  • Project mention: DocArray – Represent, send, and store multimodal data for ML | news.ycombinator.com | 2023-04-27
  • examples

    Jupyter Notebooks to help you get hands-on with Pinecone vector databases (by pinecone-io)

  • Project mention: I’m working on making a ChatGPT app with long term memory | /r/ChatGPTCoding | 2023-04-24
  • clip-retrieval

    Easily compute clip embeddings and build a clip retrieval system with them

  • Project mention: FLaNK AI for 11 March 2024 | dev.to | 2024-03-11
  • awesome-generative-ai

    A curated list of Generative AI tools, works, models, and references (by filipecalegario)

  • Project mention: Generative AI – A curated list of Generative AI tools, works, models | news.ycombinator.com | 2023-07-14
  • usearch

    Fast Open-Source Search & Clustering engine × for Vectors & 🔜 Strings × in C++, C, Python, JavaScript, Rust, Java, Objective-C, Swift, C#, GoLang, and Wolfram 🔍

  • Project mention: USearch SQLite Extensions for Vector and Text Search | news.ycombinator.com | 2024-02-22
  • mteb

    MTEB: Massive Text Embedding Benchmark

  • Project mention: AI for AWS Documentation | news.ycombinator.com | 2023-07-06

    RAG is very difficult to do right. I am experimenting with various RAG projects from [1]. The main problems are:

    - Chunking can interfer with context boundaries

    - Content vectors can differ vastly from question vectors, for this you have to use hypothetical embeddings (they generate artificial questions and store them)

    - Instead of saving just one embedding per text-chuck you should store various (text chunk, hypothetical embedding questions, meta data)

    - RAG will miserably fail with requests like "summarize the whole document"

    - to my knowledge, openAI embeddings aren't performing well, use a embedding that is optimized for question answering or information retrieval and supports multi language. Also look into instructor embeddings: https://github.com/embeddings-benchmark/mteb

    1 https://github.com/underlines/awesome-marketing-datascience/...

  • kernel-memory

    Index and query any data using LLM and natural language, tracking sources and showing citations.

  • Project mention: Open source alternative to ChatGPT and ChatPDF-like AI tools | news.ycombinator.com | 2023-12-09

    about #3 I’ll recommend https://github.com/microsoft/kernel-memory :)

  • uform

    Pocket-Sized Multimodal AI for content understanding and generation across multilingual texts, images, and 🔜 video, up to 5x faster than OpenAI CLIP and LLaVA 🖼️ & 🖋️

  • Project mention: Show HN: UForm v2 Featuring Multimodal Matryoshka, Multimodal DPO, and ONNX | news.ycombinator.com | 2024-03-28
  • primeqa

    The prime repository for state-of-the-art Multilingual Question Answering research and development.

  • Project mention: State-of-the-Art Multilingual Question Answering | /r/aiengineer | 2023-07-10
  • miyagi

    Sample to envision intelligent apps with Microsoft's Copilot stack for AI-infused product experiences.

  • Project mention: Project Miyagi – Financial Coach | news.ycombinator.com | 2023-05-09
  • elastiknn

    Elasticsearch plugin for nearest neighbor search. Store vectors and run similarity search using exact and approximate algorithms.

  • SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

    SaaSHub logo
NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020). The latest post mention was on 2024-04-08.

semantic-search related posts

Index

What are some of the best open-source semantic-search projects? This list will help you:

Project Stars
1 MindsDB 21,160
2 Typesense 17,796
3 haystack 13,564
4 Weaviate 9,436
5 txtai 6,910
6 GPTCache 6,387
7 khoj 4,760
8 marqo 4,086
9 llmware 3,056
10 databerry 2,857
11 Top2Vec 2,833
12 docarray 2,730
13 examples 2,396
14 clip-retrieval 2,115
15 awesome-generative-ai 1,957
16 usearch 1,611
17 mteb 1,314
18 kernel-memory 1,150
19 uform 859
20 primeqa 696
21 miyagi 610
22 elastiknn 352
23 awesome-semantic-search 319
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com