Python Embeddings

Open-source Python projects categorized as Embeddings

Top 23 Python Embedding Projects

  • chroma

    the AI-native open-source embedding database

    Project mention: 7 Vector Databases Every Developer Should Know! | | 2024-02-08

    Chroma DB is a newer entrant in the vector database arena, designed specifically for handling high-dimensional color vectors. It's particularly useful for applications in digital media, e-commerce, and content discovery, where color similarity plays a crucial role in search and recommendation algorithms.

  • txtai

    💡 All-in-one open-source embeddings database for semantic search, LLM orchestration and language model workflows

    Project mention: Build knowledge graphs with LLM-driven entity extraction | | 2024-02-21

    txtai is an all-in-one embeddings database for semantic search, LLM orchestration and language model workflows.

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

  • pytorch-metric-learning

    The easiest way to use deep metric learning in your application. Modular, flexible, and extensible. Written in PyTorch.

  • hub

    A library for transfer learning by reusing parts of TensorFlow models. (by tensorflow)

  • towhee

    Towhee is a framework that is dedicated to making neural data processing pipelines simple and fast.

    Project mention: FLaNK Stack Weekly for 14 Aug 2023 | | 2023-08-14
  • lightly

    A python library for self-supervised learning on images.

    Project mention: Show HN: Lightly – A Python library for self-supervised learning on images | | 2023-11-16
  • llmware

    Providing enterprise-grade LLM-based development framework, tools, and fine-tuned models.

    Project mention: FLaNK Stack Weekly 19 Feb 2024 | | 2024-02-19
  • Onboard AI

    ChatGPT with full context of any GitHub repo. Onboard AI learns any GitHub repo in minutes and lets you chat with it to locate functionality, understand different parts, and generate new code. Use it for free at

  • GPTDiscord

    A robust, all-in-one GPT interface for Discord. ChatGPT-style conversations, image generation, AI-moderation, custom indexes/knowledgebase, youtube summarizer, and more!

    Project mention: Full-environment code interpreter in discord (just like ChatGPT!) + Tons of other features like multi-modality chat, internet-connected chat, chatting with your documents, and more! | /r/SideProject | 2023-10-31
  • magnitude

    A fast, efficient universal vector embedding utility package.

  • instructor-embedding

    [ACL 2023] One Embedder, Any Task: Instruction-Finetuned Text Embeddings

    Project mention: My experience on starting with fine tuning LLMs with custom data | /r/LocalLLaMA | 2023-07-10

    If you li embeddings and vector DB, you should look into this:

  • eda_nlp

    Data augmentation for NLP, presented at EMNLP 2019

  • contextualized-topic-models

    A python package to run contextualized topic modeling. CTMs combine contextualized embeddings (e.g., BERT) with topic models to get coherent topics. Published at EACL and ACL 2021.

    Project mention: [Project]Topic modelling of tweets from the same user | /r/MachineLearning | 2023-04-14

    In our experiments, CTM works well with tweets: (I'm one of the authors)

  • SeaGOAT

    local-first semantic code search engine

    Project mention: Reviewing AI Code Search Tools | | 2023-09-28

    In this blog post, I’ll be comparing 3 distinct AI-first code search tools I recently came across: Cody (developed by late-stage startup, Sourcegraph), SeaGOAT (an open-source project that was trending on HN last week), and Bloop (an early-stage YC startup). I’ll be evaluating them along the dimensions of user-friendliness as well as their accuracy.

  • NeumAI

    Neum AI is a best-in-class framework to manage the creation and synchronization of vector embeddings at large scale.

    Project mention: Show HN: Neum AI – Open-source large-scale RAG framework | | 2023-11-21

    Interesting to see that the semantic chunking in the tools library is a wrapper around GPT-4. Asks GPT for the python code and executes it:

  • PolyFuzz

    Fuzzy string matching, grouping, and evaluation.

    Project mention: "We have great datasets" | /r/dataengineering | 2023-06-08
  • vectorflow

    VectorFlow is a high volume vector embedding pipeline that ingests raw data, transforms it into vectors and writes it to a vector DB of your choice. (by dgarnitz)

    Project mention: FLaNK Weekly 08 Jan 2024 | | 2024-01-08
  • langchain-chatbot

    AI Chatbot for analyzing/extracting information from data in conversational format.

    Project mention: Legalyze – AI for Lawyers to Query Case Files | | 2023-05-21

    We have built, a tool for lawyers to query thousands of files at once. We are using Langchain in coordination with GPT-4 and Pinecone to query massive sets of data at once.

    Lawyers can also generate procedural documents like motions and requests using their case as context.

    Contact [email protected] for a trial and check out our open source project -

  • jodie

    A PyTorch implementation of ACM SIGKDD 2019 paper "Predicting Dynamic Embedding Trajectory in Temporal Interaction Networks"

  • osintgpt

    An open-source intelligence (OSINT) analysis tool leveraging GPT-powered embeddings and vector search engines for efficient data processing

    Project mention: FLaNK Stack Weekly 5 September 2023 | | 2023-09-05
  • DataChad

    Ask questions about any data source by leveraging langchains

    Project mention: I am new to language models but I want to create a knowledge base upon a bunch of files so that I can ask questions and get answers back. | /r/LocalLLaMA | 2023-06-18

    For started, you can use gustavz/DataChad: Ask questions about any data source by leveraging langchains (

  • AnglE

    Angle-optimized Text Embeddings | 🔥 SOTA on STS and MTEB Leaderboard (by SeanLee97)

    Project mention: FLaNK Stack Weekly 22 January 2024 | | 2024-01-22
  • pyRDF2Vec

    🐍 Python Implementation and Extension of RDF2Vec

  • laserembeddings

    LASER multilingual sentence embeddings as a pip package

  • WorkOS

    The modern API for authentication & user identity. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.

NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020). The latest post mention was on 2024-02-21.

Python Embeddings related posts


What are some of the best open-source Embedding projects in Python? This list will help you:

Project Stars
1 chroma 11,042
2 txtai 6,452
3 pytorch-metric-learning 5,658
4 hub 3,425
5 towhee 2,910
6 lightly 2,687
7 llmware 2,645
8 GPTDiscord 1,753
9 magnitude 1,611
10 instructor-embedding 1,586
11 eda_nlp 1,531
12 contextualized-topic-models 1,143
13 SeaGOAT 885
14 NeumAI 721
15 PolyFuzz 696
16 vectorflow 613
17 langchain-chatbot 345
18 jodie 321
19 osintgpt 303
20 DataChad 292
21 AnglE 277
22 pyRDF2Vec 235
23 laserembeddings 220
The modern API for authentication & user identity.
The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.