Python Embeddings

Open-source Python projects categorized as Embeddings

Top 23 Python Embedding Projects

  1. mem0

    Memory for AI Agents; SOTA in AI Agent Memory; Announcing OpenMemory MCP - local and secure memory management.

    Project mention: Show HN: How to make your MCP clients more context-aware | news.ycombinator.com | 2025-05-13
  2. InfluxDB

    InfluxDB – Built for High-Performance Time Series Workloads. InfluxDB 3 OSS is now GA. Transform, enrich, and act on time series data directly in the database. Automate critical tasks and eliminate the need to move data externally. Download now.

    InfluxDB logo
  3. h2ogpt

    Private chat with local GPT with document, images, video, etc. 100% private, Apache 2.0. Supports oLLaMa, Mixtral, llama.cpp, and more. Demo: https://gpt.h2o.ai/ https://gpt-docs.h2o.ai/

    Project mention: Major Technologies Worth Learning in 2025 for Data Professionals | dev.to | 2024-12-07

    Artificial Intelligence (AI) is becoming a ubiquitous, and dare I say, indispensable part of data workflows. Tools like ChatGPT have made it easier to review data and write reports. But diving even deeper, tools like DataRobot, H2O.ai, and Google’s AutoML are also simplifying machine learning pipelines and automating repetitive tasks, enabling professionals to focus on high-value activities like model optimization and data storytelling. Mastering these tools will not only boost productivity but also ensure you remain competitive in an AI-first world.

  4. txtai

    πŸ’‘ All-in-one open-source AI framework for semantic search, LLM orchestration and language model workflows

    Project mention: Chunking your data for RAG | dev.to | 2025-02-11
  5. FlagEmbedding

    Retrieval and Retrieval-augmented LLMs

    Project mention: Understanding RAG (Part 5): Recommendations and wrap-up | dev.to | 2024-09-09

    Choosing the right embedding model is equally important for effective semantic matching of queries and chunk blocks. To select the appropriate open-source embedding model, the authors conducted another experiment using the evaluation module of FlagEmbedding, which uses the dataset namespace-Pt/msmarco7 for queries and the dataset namespace-Pt/msmarco-corpus8 for the corpus and metrics like RR and MRR were used for evaluation.

  6. pytorch-metric-learning

    The easiest way to use deep metric learning in your application. Modular, flexible, and extensible. Written in PyTorch.

  7. AutoRAG

    AutoRAG: An Open-Source Framework for Retrieval-Augmented Generation (RAG) Evaluation & Optimization with AutoML-Style Automation

    Project mention: AIM Weekly 28 Oct 2024 | dev.to | 2024-10-28

    πŸ“Ž AutoRAG with Milvus πŸ› οΈ ADO 🫢 Self Hosting LLM 🌐 Noema Declarative AI πŸ“ New NIM Blueprint for building AI Virtual Assistant πŸš™ Zilliz Integrations 🫢 Using Milvus for Semantic Search πŸ€– Contextual Retrieval πŸ“Ž Meta: Quantized Light Weight Models πŸš™ https://arxiv.org/pdf/2407.01219 βœ… Cool Icons πŸ™Œ IBM Watson AI Milvus Bot πŸ“Ž The Hacker's Browser πŸ› οΈ Small and Mighty H2O Model πŸ“ Zilliz Cloud vs Qdrant πŸ’« Gravatino and Agents πŸ› οΈ OSS Summit Europe 2024 Report ▢️ RAG Strategi πŸ€– MS AI Data Visualizations 🌐 Graph RAG πŸ‘½ South Bay Meetup 15 Oct 2024 🦾 Influx and Milvus πŸ‘½ Multimodal Pipelines ✨ Constrained Sampling from LLM πŸš• BAML: Cheaper, Fast and More Accurate Function Calling πŸ“Š Infinite World Generation with outlines txt πŸ’» Ollama Client Swift πŸ” Atomic Agents πŸ•ΆοΈ PYMUPDF4LLM πŸš• Milvus for AI Agents πŸ“Š Fine Tuning LLAMA 3 with ORPO 🦾 Run NVIDIA Models πŸ’» LLM Training Meta Lingua ✨ 1 Bit LLM - MS BitNet πŸ’» Intro πŸ•ΆοΈ Mastering Chunk πŸ“Š Storm Stanford Tool 🐍 DAMO NLP SG CaRing πŸ” LLM Reasoners

  8. hub

    A library for transfer learning by reusing parts of TensorFlow models. (by tensorflow)

  9. SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

    SaaSHub logo
  10. lightly

    A python library for self-supervised learning on images.

  11. towhee

    Towhee is a framework that is dedicated to making neural data processing pipelines simple and fast.

  12. prompttools

    Open-source tools for prompt testing and experimentation, with support for both LLMs (e.g. OpenAI, LLaMA) and vector databases (e.g. Chroma, Weaviate, LanceDB).

    Project mention: Universal Personal Assistant with LLMs | dev.to | 2024-12-11

    LLM answer quality directly relates to its given prompts, and therefore, effective prompt engineering is necessary. The landscape of prompt managing platforms and libraries increased manifold. Some tools now actively incorporate specific tweaks of the most recent commercial models, enabling the formulation of prompts that are injected with model-specific formulations. Example libraries are dspy, LMQL, Outlines, and Prompttools,

  13. datachain

    ETL, Analytics, Versioning for Unstructured Data

    Project mention: DBT for Unstructured Data – DataChain | news.ycombinator.com | 2024-11-04
  14. deprecated-generative-ai-python

    This SDK is now deprecated, use the new unified Google GenAI SDK.

    Project mention: Google Gemini has the worst LLM API | news.ycombinator.com | 2025-05-03

    lemming, this is super helpful, thank you. We provide the genai SDK (https://github.com/googleapis/python-genai) to reduce the learning curve in 4 languages (GA: Python, Go Preview: Node.JS, Java). The SDK works for all Gemini APIs provided by Google AI Studio (https://ai.google.dev/) and Vertex AI.

  15. fastembed

    Fast, Accurate, Lightweight Python library to make State of the Art Embedding

  16. instructor-embedding

    [ACL 2023] One Embedder, Any Task: Instruction-Finetuned Text Embeddings

  17. GPTDiscord

    A robust, all-in-one GPT interface for Discord. ChatGPT-style conversations, image generation, AI-moderation, custom indexes/knowledgebase, youtube summarizer, and more!

  18. magnitude

    A fast, efficient universal vector embedding utility package.

    Project mention: Show HN: Wordllama – Things you can do with the token embeddings of an LLM | news.ycombinator.com | 2024-09-14

    Interesting... looks like this uses pymagnitude

    https://github.com/plasticityai/magnitude

  19. eda_nlp

    Data augmentation for NLP, presented at EMNLP 2019

  20. ModernBERT

    Bringing BERT into modernity via both architecture changes and scaling

    Project mention: ModernBERT | news.ycombinator.com | 2024-12-19
  21. hazm

    Persian NLP Toolkit

  22. contextualized-topic-models

    A python package to run contextualized topic modeling. CTMs combine contextualized embeddings (e.g., BERT) with topic models to get coherent topics. Published at EACL and ACL 2021 (Bianchi et al.).

  23. SeaGOAT

    local-first semantic code search engine

  24. swiss_army_llama

    A FastAPI service for semantic text search using precomputed embeddings and advanced similarity measures, with built-in support for various file types through textract.

  25. NeumAI

    Neum AI is a best-in-class framework to manage the creation and synchronization of vector embeddings at large scale.

  26. SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

    SaaSHub logo
NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020).

Python Embeddings discussion

Log in or Post with

Python Embeddings related posts

  • Google Gemini has the worst LLM API

    11 projects | news.ycombinator.com | 3 May 2025
  • LightlyTrain: Better Vision Models, Faster – No Labels Needed

    3 projects | news.ycombinator.com | 15 Apr 2025
  • Getting started with LLM APIs

    3 projects | dev.to | 2 Jan 2025
  • ModernBERT

    1 project | news.ycombinator.com | 19 Dec 2024
  • Show HN: Vicinity – Fast, Lightweight Nearest Neighbors with Flexible Back Ends

    3 projects | news.ycombinator.com | 1 Dec 2024
  • AI Democratization: Unlocking the Power of Artificial Intelligence for All

    1 project | dev.to | 31 Oct 2024
  • How I Learned Generative AI in Two Weeks (and You Can Too): Part 2 - Embeddings

    1 project | dev.to | 11 Oct 2024
  • A note from our sponsor - InfluxDB
    www.influxdata.com | 20 May 2025
    InfluxDB 3 OSS is now GA. Transform, enrich, and act on time series data directly in the database. Automate critical tasks and eliminate the need to move data externally. Download now. Learn more β†’

Index

What are some of the best open-source Embedding projects in Python? This list will help you:

# Project Stars
1 mem0 31,292
2 h2ogpt 11,804
3 txtai 10,935
4 FlagEmbedding 9,624
5 pytorch-metric-learning 6,137
6 AutoRAG 3,927
7 hub 3,496
8 lightly 3,383
9 towhee 3,365
10 prompttools 2,852
11 datachain 2,557
12 deprecated-generative-ai-python 2,205
13 fastembed 2,056
14 instructor-embedding 1,949
15 GPTDiscord 1,839
16 magnitude 1,645
17 eda_nlp 1,623
18 ModernBERT 1,360
19 hazm 1,284
20 contextualized-topic-models 1,229
21 SeaGOAT 1,139
22 swiss_army_llama 1,013
23 NeumAI 854

Sponsored
InfluxDB – Built for High-Performance Time Series Workloads
InfluxDB 3 OSS is now GA. Transform, enrich, and act on time series data directly in the database. Automate critical tasks and eliminate the need to move data externally. Download now.
www.influxdata.com

Did you know that Python is
the 2nd most popular programming language
based on number of references?