Embeddings

Open-source projects categorized as Embeddings

Top 23 Embedding Open-Source Projects

  • supabase

    The open source Firebase alternative.

  • Project mention: How to get free Postgres | dev.to | 2024-04-24

    Sign up for SupaBase: Head over to SupaBase and sign up. Create a new workspace and project with your preferred names.

  • quivr

    Your GenAI Second Brain 🧠 A personal productivity assistant (RAG) βš‘οΈπŸ€– Chat with your docs (PDF, CSV, ...) & apps using Langchain, GPT 3.5 / 4 turbo, Private, Anthropic, VertexAI, Ollama, LLMs, Groq that you can share with users ! Local & Private alternative to OpenAI GPTs & ChatGPT powered by retrieval-augmented generation.

  • Project mention: privateGPT VS quivr - a user suggested alternative | libhunt.com/r/privateGPT | 2024-01-12
  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • chroma

    the AI-native open-source embedding database

  • Project mention: Let’s build AI-tools with the help of AI and Typescript! | dev.to | 2024-04-23

    Package installer for Python (pip), we use this for installing the Python-based packages, such as Jupyter Lab, and we're going to use this for installing other Python-based tools like the Chroma DB vector database

  • h2ogpt

    Private chat with local GPT with document, images, video, etc. 100% private, Apache 2.0. Supports oLLaMa, Mixtral, llama.cpp, and more. Demo: https://gpt.h2o.ai/ https://codellama.h2o.ai/

  • Project mention: Ask HN: How do I train a custom LLM/ChatGPT on my own documents in Dec 2023? | news.ycombinator.com | 2023-12-24

    As others have said you want RAG.

    The most feature complete implementation I've seen is h2ogpt[0] (not affiliated).

    The code is kind of a mess (most of the logic is in an ~8000 line python file) but it supports ingestion of everything from YouTube videos to docx, pdf, etc - either offline or from the web interface. It uses langchain and a ton of additional open source libraries under the hood. It can run directly on Linux, via docker, or with one-click installers for Mac and Windows.

    It has various model hosting implementations built in - transformers, exllama, llama.cpp as well as support for model serving frameworks like vLLM, HF TGI, etc or just OpenAI.

    You can also define your preferred embedding model along with various other parameters but I've found the out of box defaults to be pretty sane and usable.

    [0] - https://github.com/h2oai/h2ogpt

  • txtai

    πŸ’‘ All-in-one open-source embeddings database for semantic search, LLM orchestration and language model workflows

  • Project mention: Build knowledge graphs with LLM-driven entity extraction | dev.to | 2024-02-21

    txtai is an all-in-one embeddings database for semantic search, LLM orchestration and language model workflows.

  • pytorch-metric-learning

    The easiest way to use deep metric learning in your application. Modular, flexible, and extensible. Written in PyTorch.

  • generative-ai

    Sample code and notebooks for Generative AI on Google Cloud (by GoogleCloudPlatform)

  • Project mention: Google Imagen 2 | news.ycombinator.com | 2023-12-13

    I've used the code based on similar examples from GitHub [1]. According to docs [2], imagegeneration@005 was released on the 11th, so I guessed it's Imagen 2, though there are no confirmations.

    [1] https://github.com/GoogleCloudPlatform/generative-ai/blob/ma...

    [2] https://console.cloud.google.com/vertex-ai/publishers/google...

  • WorkOS

    The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.

    WorkOS logo
  • paradedb

    Postgres for Search and Analytics

  • Project mention: Using ClickHouse to scale an events engine | news.ycombinator.com | 2024-04-11
  • hub

    A library for transfer learning by reusing parts of TensorFlow models. (by tensorflow)

  • lance

    Modern columnar data format for ML and LLMs implemented in Rust. Convert from parquet in 2 lines of code for 100x faster random access, vector index, and data versioning. Compatible with Pandas, DuckDB, Polars, Pyarrow, with more integrations coming..

  • Project mention: The Nimble File Format by Meta | news.ycombinator.com | 2024-04-25
  • llmware

    Providing enterprise-grade LLM-based development framework, tools, and fine-tuned models.

  • Project mention: More Agents Is All You Need: LLMs performance scales with the number of agents | news.ycombinator.com | 2024-04-06

    I couldn't agree more. You should check out LLMWare's SLIM agents (https://github.com/llmware-ai/llmware/tree/main/examples/SLI...). It's focusing on pretty much exactly this and chaining multiple local LLMs together.

    A really good topic that ties in with this is the need for deterministic sampling (I may have the terminology a bit incorrect) depending on what the model is indended for. The LLMWare team did a good 2 part video on this here as well (https://www.youtube.com/watch?v=7oMTGhSKuNY)

    I think dedicated miniture LLMs are the way forward.

    Disclaimer - Not affiliated with them in any way, just think it's a really cool project.

  • towhee

    Towhee is a framework that is dedicated to making neural data processing pipelines simple and fast.

  • Project mention: FLaNK Stack Weekly for 14 Aug 2023 | dev.to | 2023-08-14
  • lightly

    A python library for self-supervised learning on images.

  • Project mention: Show HN: Lightly – A Python library for self-supervised learning on images | news.ycombinator.com | 2023-11-16
  • ml-surveys

    πŸ“‹ Survey papers summarizing advances in deep learning, NLP, CV, graphs, reinforcement learning, recommendations, graphs, etc.

  • text-embeddings-inference

    A blazing fast inference solution for text embeddings models

  • Project mention: HuggingFace text-generation-inference is reverting to Apache 2.0 License | news.ycombinator.com | 2024-04-08

    Worth noting that this also impacts the great https://github.com/huggingface/text-embeddings-inference, which allows anyone to run state of the art embeddings with great performance.

  • awesome-generative-ai

    A curated list of Generative AI tools, works, models, and references (by filipecalegario)

  • Project mention: Generative AI – A curated list of Generative AI tools, works, models | news.ycombinator.com | 2023-07-14
  • obsidian-smart-connections

    Chat with your notes & see links to related content with AI embeddings. Use local models or 100+ via APIs like Claude, Gemini, ChatGPT & Llama 3

  • Project mention: Ask HN: How are you currently using AI (personally or professionally)? | news.ycombinator.com | 2023-07-26

    For my personal notes, I use Smart Connections[1] with Obsidian. I am considering devising my own solution using LlamaIndex[2] in the near future.

    For coding, I use Copilot[3]. While it's been great for writing boilerplate code, it falls short in every other regard. I also had the opportunity to try the new version of Copilot as well, but it feels like a glorified ChatGPT inside VSCode.

    For everything else, I use a tiny tool I made[4] which enables me to invoke my own prompts in basically any application that allows me to select text.

    [1] https://github.com/brianpetro/obsidian-smart-connections

    [2] https://gpt-index.readthedocs.io/en/latest/getting_started/s...

    [3] https://github.com/features/copilot

    [4] https://github.com/overflowy/chat-key

  • GPTDiscord

    A robust, all-in-one GPT interface for Discord. ChatGPT-style conversations, image generation, AI-moderation, custom indexes/knowledgebase, youtube summarizer, and more!

  • Project mention: Full-environment code interpreter in discord (just like ChatGPT!) + Tons of other features like multi-modality chat, internet-connected chat, chatting with your documents, and more! | /r/SideProject | 2023-10-31
  • instructor-embedding

    [ACL 2023] One Embedder, Any Task: Instruction-Finetuned Text Embeddings

  • Project mention: My experience on starting with fine tuning LLMs with custom data | /r/LocalLLaMA | 2023-07-10

    If you li embeddings and vector DB, you should look into this: https://github.com/HKUNLP/instructor-embedding

  • featureform

    The Virtual Feature Store. Turn your existing data infrastructure into a feature store.

  • Project mention: Still look familiar? | /r/u_featureform | 2023-07-13
  • magnitude

    A fast, efficient universal vector embedding utility package.

  • eda_nlp

    Data augmentation for NLP, presented at EMNLP 2019

  • contextualized-topic-models

    A python package to run contextualized topic modeling. CTMs combine contextualized embeddings (e.g., BERT) with topic models to get coherent topics. Published at EACL and ACL 2021.

  • SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

    SaaSHub logo
NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020).

Embeddings related posts

Index

What are some of the best open-source Embedding projects? This list will help you:

Project Stars
1 supabase 65,869
2 quivr 32,240
3 chroma 12,189
4 h2ogpt 10,398
5 txtai 6,953
6 pytorch-metric-learning 5,764
7 generative-ai 5,396
8 paradedb 3,803
9 hub 3,436
10 lance 3,256
11 llmware 3,127
12 towhee 2,989
13 lightly 2,741
14 ml-surveys 2,736
15 text-embeddings-inference 1,982
16 awesome-generative-ai 1,971
17 obsidian-smart-connections 1,837
18 GPTDiscord 1,780
19 instructor-embedding 1,695
20 featureform 1,674
21 magnitude 1,611
22 eda_nlp 1,536
23 contextualized-topic-models 1,157

Sponsored
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com