InfluxDB 3 OSS is now GA. Transform, enrich, and act on time series data directly in the database. Automate critical tasks and eliminate the need to move data externally. Download now. Learn more β
Top 23 Python Embedding Projects
-
mem0
Memory for AI Agents; SOTA in AI Agent Memory; Announcing OpenMemory MCP - local and secure memory management.
Project mention: Show HN: How to make your MCP clients more context-aware | news.ycombinator.com | 2025-05-13 -
InfluxDB
InfluxDB β Built for High-Performance Time Series Workloads. InfluxDB 3 OSS is now GA. Transform, enrich, and act on time series data directly in the database. Automate critical tasks and eliminate the need to move data externally. Download now.
-
h2ogpt
Private chat with local GPT with document, images, video, etc. 100% private, Apache 2.0. Supports oLLaMa, Mixtral, llama.cpp, and more. Demo: https://gpt.h2o.ai/ https://gpt-docs.h2o.ai/
Project mention: Major Technologies Worth Learning in 2025 for Data Professionals | dev.to | 2024-12-07Artificial Intelligence (AI) is becoming a ubiquitous, and dare I say, indispensable part of data workflows. Tools like ChatGPT have made it easier to review data and write reports. But diving even deeper, tools like DataRobot, H2O.ai, and Googleβs AutoML are also simplifying machine learning pipelines and automating repetitive tasks, enabling professionals to focus on high-value activities like model optimization and data storytelling. Mastering these tools will not only boost productivity but also ensure you remain competitive in an AI-first world.
-
txtai
π‘ All-in-one open-source AI framework for semantic search, LLM orchestration and language model workflows
-
Choosing the right embedding model is equally important for effective semantic matching of queries and chunk blocks. To select the appropriate open-source embedding model, the authors conducted another experiment using the evaluation module of FlagEmbedding, which uses the dataset namespace-Pt/msmarco7 for queries and the dataset namespace-Pt/msmarco-corpus8 for the corpus and metrics like RR and MRR were used for evaluation.
-
pytorch-metric-learning
The easiest way to use deep metric learning in your application. Modular, flexible, and extensible. Written in PyTorch.
-
AutoRAG
AutoRAG: An Open-Source Framework for Retrieval-Augmented Generation (RAG) Evaluation & Optimization with AutoML-Style Automation
π AutoRAG with Milvus π οΈ ADO π«Ά Self Hosting LLM π Noema Declarative AI π New NIM Blueprint for building AI Virtual Assistant π Zilliz Integrations π«Ά Using Milvus for Semantic Search π€ Contextual Retrieval π Meta: Quantized Light Weight Models π https://arxiv.org/pdf/2407.01219 β Cool Icons π IBM Watson AI Milvus Bot π The Hacker's Browser π οΈ Small and Mighty H2O Model π Zilliz Cloud vs Qdrant π« Gravatino and Agents π οΈ OSS Summit Europe 2024 Report βΆοΈ RAG Strategi π€ MS AI Data Visualizations π Graph RAG π½ South Bay Meetup 15 Oct 2024 π¦Ύ Influx and Milvus π½ Multimodal Pipelines β¨ Constrained Sampling from LLM π BAML: Cheaper, Fast and More Accurate Function Calling π Infinite World Generation with outlines txt π» Ollama Client Swift π Atomic Agents πΆοΈ PYMUPDF4LLM π Milvus for AI Agents π Fine Tuning LLAMA 3 with ORPO π¦Ύ Run NVIDIA Models π» LLM Training Meta Lingua β¨ 1 Bit LLM - MS BitNet π» Intro πΆοΈ Mastering Chunk π Storm Stanford Tool π DAMO NLP SG CaRing π LLM Reasoners
-
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
-
-
towhee
Towhee is a framework that is dedicated to making neural data processing pipelines simple and fast.
-
prompttools
Open-source tools for prompt testing and experimentation, with support for both LLMs (e.g. OpenAI, LLaMA) and vector databases (e.g. Chroma, Weaviate, LanceDB).
LLM answer quality directly relates to its given prompts, and therefore, effective prompt engineering is necessary. The landscape of prompt managing platforms and libraries increased manifold. Some tools now actively incorporate specific tweaks of the most recent commercial models, enabling the formulation of prompts that are injected with model-specific formulations. Example libraries are dspy, LMQL, Outlines, and Prompttools,
-
-
lemming, this is super helpful, thank you. We provide the genai SDK (https://github.com/googleapis/python-genai) to reduce the learning curve in 4 languages (GA: Python, Go Preview: Node.JS, Java). The SDK works for all Gemini APIs provided by Google AI Studio (https://ai.google.dev/) and Vertex AI.
-
-
-
GPTDiscord
A robust, all-in-one GPT interface for Discord. ChatGPT-style conversations, image generation, AI-moderation, custom indexes/knowledgebase, youtube summarizer, and more!
-
Project mention: Show HN: Wordllama β Things you can do with the token embeddings of an LLM | news.ycombinator.com | 2024-09-14
Interesting... looks like this uses pymagnitude
https://github.com/plasticityai/magnitude
-
-
-
-
contextualized-topic-models
A python package to run contextualized topic modeling. CTMs combine contextualized embeddings (e.g., BERT) with topic models to get coherent topics. Published at EACL and ACL 2021 (Bianchi et al.).
-
-
swiss_army_llama
A FastAPI service for semantic text search using precomputed embeddings and advanced similarity measures, with built-in support for various file types through textract.
-
NeumAI
Neum AI is a best-in-class framework to manage the creation and synchronization of vector embeddings at large scale.
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
Python Embeddings discussion
Python Embeddings related posts
-
Google Gemini has the worst LLM API
-
LightlyTrain: Better Vision Models, Faster β No Labels Needed
-
Getting started with LLM APIs
-
ModernBERT
-
Show HN: Vicinity β Fast, Lightweight Nearest Neighbors with Flexible Back Ends
-
AI Democratization: Unlocking the Power of Artificial Intelligence for All
-
How I Learned Generative AI in Two Weeks (and You Can Too): Part 2 - Embeddings
-
A note from our sponsor - InfluxDB
www.influxdata.com | 20 May 2025
Index
What are some of the best open-source Embedding projects in Python? This list will help you:
# | Project | Stars |
---|---|---|
1 | mem0 | 31,292 |
2 | h2ogpt | 11,804 |
3 | txtai | 10,935 |
4 | FlagEmbedding | 9,624 |
5 | pytorch-metric-learning | 6,137 |
6 | AutoRAG | 3,927 |
7 | hub | 3,496 |
8 | lightly | 3,383 |
9 | towhee | 3,365 |
10 | prompttools | 2,852 |
11 | datachain | 2,557 |
12 | deprecated-generative-ai-python | 2,205 |
13 | fastembed | 2,056 |
14 | instructor-embedding | 1,949 |
15 | GPTDiscord | 1,839 |
16 | magnitude | 1,645 |
17 | eda_nlp | 1,623 |
18 | ModernBERT | 1,360 |
19 | hazm | 1,284 |
20 | contextualized-topic-models | 1,229 |
21 | SeaGOAT | 1,139 |
22 | swiss_army_llama | 1,013 |
23 | NeumAI | 854 |