Python Embeddings

Open-source Python projects categorized as Embeddings

Top 23 Python Embedding Projects

  • chroma

    the AI-native open-source embedding database

  • Project mention: Let’s build AI-tools with the help of AI and Typescript! | dev.to | 2024-04-23

    Package installer for Python (pip), we use this for installing the Python-based packages, such as Jupyter Lab, and we're going to use this for installing other Python-based tools like the Chroma DB vector database

  • txtai

    πŸ’‘ All-in-one open-source embeddings database for semantic search, LLM orchestration and language model workflows

  • Project mention: Build knowledge graphs with LLM-driven entity extraction | dev.to | 2024-02-21

    txtai is an all-in-one embeddings database for semantic search, LLM orchestration and language model workflows.

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • pytorch-metric-learning

    The easiest way to use deep metric learning in your application. Modular, flexible, and extensible. Written in PyTorch.

  • hub

    A library for transfer learning by reusing parts of TensorFlow models. (by tensorflow)

  • llmware

    Providing enterprise-grade LLM-based development framework, tools, and fine-tuned models.

  • Project mention: More Agents Is All You Need: LLMs performance scales with the number of agents | news.ycombinator.com | 2024-04-06

    I couldn't agree more. You should check out LLMWare's SLIM agents (https://github.com/llmware-ai/llmware/tree/main/examples/SLI...). It's focusing on pretty much exactly this and chaining multiple local LLMs together.

    A really good topic that ties in with this is the need for deterministic sampling (I may have the terminology a bit incorrect) depending on what the model is indended for. The LLMWare team did a good 2 part video on this here as well (https://www.youtube.com/watch?v=7oMTGhSKuNY)

    I think dedicated miniture LLMs are the way forward.

    Disclaimer - Not affiliated with them in any way, just think it's a really cool project.

  • towhee

    Towhee is a framework that is dedicated to making neural data processing pipelines simple and fast.

  • Project mention: FLaNK Stack Weekly for 14 Aug 2023 | dev.to | 2023-08-14
  • lightly

    A python library for self-supervised learning on images.

  • Project mention: Show HN: Lightly – A Python library for self-supervised learning on images | news.ycombinator.com | 2023-11-16
  • WorkOS

    The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.

    WorkOS logo
  • GPTDiscord

    A robust, all-in-one GPT interface for Discord. ChatGPT-style conversations, image generation, AI-moderation, custom indexes/knowledgebase, youtube summarizer, and more!

  • Project mention: Full-environment code interpreter in discord (just like ChatGPT!) + Tons of other features like multi-modality chat, internet-connected chat, chatting with your documents, and more! | /r/SideProject | 2023-10-31
  • instructor-embedding

    [ACL 2023] One Embedder, Any Task: Instruction-Finetuned Text Embeddings

  • Project mention: My experience on starting with fine tuning LLMs with custom data | /r/LocalLLaMA | 2023-07-10

    If you li embeddings and vector DB, you should look into this: https://github.com/HKUNLP/instructor-embedding

  • magnitude

    A fast, efficient universal vector embedding utility package.

  • eda_nlp

    Data augmentation for NLP, presented at EMNLP 2019

  • contextualized-topic-models

    A python package to run contextualized topic modeling. CTMs combine contextualized embeddings (e.g., BERT) with topic models to get coherent topics. Published at EACL and ACL 2021.

  • SeaGOAT

    local-first semantic code search engine

  • Project mention: Reviewing AI Code Search Tools | dev.to | 2023-09-28

    In this blog post, I’ll be comparing 3 distinct AI-first code search tools I recently came across: Cody (developed by late-stage startup, Sourcegraph), SeaGOAT (an open-source project that was trending on HN last week), and Bloop (an early-stage YC startup). I’ll be evaluating them along the dimensions of user-friendliness as well as their accuracy.

  • NeumAI

    Neum AI is a best-in-class framework to manage the creation and synchronization of vector embeddings at large scale.

  • Project mention: Show HN: Neum AI – Open-source large-scale RAG framework | news.ycombinator.com | 2023-11-21

    Interesting to see that the semantic chunking in the tools library is a wrapper around GPT-4. Asks GPT for the python code and executes it: https://github.com/NeumTry/NeumAI/blob/main/neumai-tools/neu...

  • fastembed

    Fast, Accurate, Lightweight Python library to make State of the Art Embedding

  • Project mention: FastLLM by Qdrant – lightweight LLM tailored For RAG | news.ycombinator.com | 2024-04-01
  • PolyFuzz

    Fuzzy string matching, grouping, and evaluation.

  • Project mention: "We have great datasets" | /r/dataengineering | 2023-06-08
  • vectorflow

    VectorFlow is a high volume vector embedding pipeline that ingests raw data, transforms it into vectors and writes it to a vector DB of your choice. (by dgarnitz)

  • Project mention: FLaNK Weekly 08 Jan 2024 | dev.to | 2024-01-08
  • langchain-chatbot

    AI Chatbot for analyzing/extracting information from data in conversational format.

  • Project mention: Legalyze – AI for Lawyers to Query Case Files | news.ycombinator.com | 2023-05-21

    We have built Legalyze.ai, a tool for lawyers to query thousands of files at once. We are using Langchain in coordination with GPT-4 and Pinecone to query massive sets of data at once.

    Lawyers can also generate procedural documents like motions and requests using their case as context.

    Contact [email protected] for a trial and check out our open source project - https://github.com/Haste171/langchain-chatbot

  • AnglE

    Angle-optimized Text Embeddings | πŸ”₯ SOTA on STS and MTEB Leaderboard (by SeanLee97)

  • Project mention: FLaNK Stack Weekly 22 January 2024 | dev.to | 2024-01-22
  • jodie

    A PyTorch implementation of ACM SIGKDD 2019 paper "Predicting Dynamic Embedding Trajectory in Temporal Interaction Networks"

  • osintgpt

    An open-source intelligence (OSINT) analysis tool leveraging GPT-powered embeddings and vector search engines for efficient data processing

  • Project mention: FLaNK Stack Weekly 5 September 2023 | dev.to | 2023-09-05
  • DataChad

    Ask questions about any data source by leveraging langchains

  • Project mention: I am new to language models but I want to create a knowledge base upon a bunch of files so that I can ask questions and get answers back. | /r/LocalLLaMA | 2023-06-18

    For started, you can use gustavz/DataChad: Ask questions about any data source by leveraging langchains (github.com)

  • pyRDF2Vec

    🐍 Python Implementation and Extension of RDF2Vec

  • SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

    SaaSHub logo
NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020).

Python Embeddings related posts

Index

What are some of the best open-source Embedding projects in Python? This list will help you:

Project Stars
1 chroma 12,189
2 txtai 6,953
3 pytorch-metric-learning 5,754
4 hub 3,436
5 llmware 3,086
6 towhee 2,970
7 lightly 2,741
8 GPTDiscord 1,780
9 instructor-embedding 1,695
10 magnitude 1,610
11 eda_nlp 1,536
12 contextualized-topic-models 1,157
13 SeaGOAT 906
14 NeumAI 774
15 fastembed 759
16 PolyFuzz 713
17 vectorflow 634
18 langchain-chatbot 371
19 AnglE 341
20 jodie 333
21 osintgpt 323
22 DataChad 301
23 pyRDF2Vec 240

Sponsored
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com