Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality. Learn more β
Top 23 Python Embedding Projects
-
txtai
π‘ All-in-one open-source embeddings database for semantic search, LLM orchestration and language model workflows
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
-
pytorch-metric-learning
The easiest way to use deep metric learning in your application. Modular, flexible, and extensible. Written in PyTorch.
-
towhee
Towhee is a framework that is dedicated to making neural data processing pipelines simple and fast.
-
WorkOS
The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
-
GPTDiscord
A robust, all-in-one GPT interface for Discord. ChatGPT-style conversations, image generation, AI-moderation, custom indexes/knowledgebase, youtube summarizer, and more!
-
contextualized-topic-models
A python package to run contextualized topic modeling. CTMs combine contextualized embeddings (e.g., BERT) with topic models to get coherent topics. Published at EACL and ACL 2021.
-
NeumAI
Neum AI is a best-in-class framework to manage the creation and synchronization of vector embeddings at large scale.
-
vectorflow
VectorFlow is a high volume vector embedding pipeline that ingests raw data, transforms it into vectors and writes it to a vector DB of your choice. (by dgarnitz)
-
langchain-chatbot
AI Chatbot for analyzing/extracting information from data in conversational format.
-
jodie
A PyTorch implementation of ACM SIGKDD 2019 paper "Predicting Dynamic Embedding Trajectory in Temporal Interaction Networks"
-
osintgpt
An open-source intelligence (OSINT) analysis tool leveraging GPT-powered embeddings and vector search engines for efficient data processing
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
Package installer for Python (pip), we use this for installing the Python-based packages, such as Jupyter Lab, and we're going to use this for installing other Python-based tools like the Chroma DB vector database
txtai is an all-in-one embeddings database for semantic search, LLM orchestration and language model workflows.
Project mention: More Agents Is All You Need: LLMs performance scales with the number of agents | news.ycombinator.com | 2024-04-06I couldn't agree more. You should check out LLMWare's SLIM agents (https://github.com/llmware-ai/llmware/tree/main/examples/SLI...). It's focusing on pretty much exactly this and chaining multiple local LLMs together.
A really good topic that ties in with this is the need for deterministic sampling (I may have the terminology a bit incorrect) depending on what the model is indended for. The LLMWare team did a good 2 part video on this here as well (https://www.youtube.com/watch?v=7oMTGhSKuNY)
I think dedicated miniture LLMs are the way forward.
Disclaimer - Not affiliated with them in any way, just think it's a really cool project.
Project mention: Show HN: Lightly β A Python library for self-supervised learning on images | news.ycombinator.com | 2023-11-16
Project mention: Full-environment code interpreter in discord (just like ChatGPT!) + Tons of other features like multi-modality chat, internet-connected chat, chatting with your documents, and more! | /r/SideProject | 2023-10-31
Project mention: My experience on starting with fine tuning LLMs with custom data | /r/LocalLLaMA | 2023-07-10If you li embeddings and vector DB, you should look into this: https://github.com/HKUNLP/instructor-embedding
In this blog post, Iβll be comparing 3 distinct AI-first code search tools I recently came across: Cody (developed by late-stage startup, Sourcegraph), SeaGOAT (an open-source project that was trending on HN last week), and Bloop (an early-stage YC startup). Iβll be evaluating them along the dimensions of user-friendliness as well as their accuracy.
Project mention: Show HN: Neum AI β Open-source large-scale RAG framework | news.ycombinator.com | 2023-11-21Interesting to see that the semantic chunking in the tools library is a wrapper around GPT-4. Asks GPT for the python code and executes it: https://github.com/NeumTry/NeumAI/blob/main/neumai-tools/neu...
Project mention: FastLLM by Qdrant β lightweight LLM tailored For RAG | news.ycombinator.com | 2024-04-01
Project mention: Legalyze β AI for Lawyers to Query Case Files | news.ycombinator.com | 2023-05-21We have built Legalyze.ai, a tool for lawyers to query thousands of files at once. We are using Langchain in coordination with GPT-4 and Pinecone to query massive sets of data at once.
Lawyers can also generate procedural documents like motions and requests using their case as context.
Contact [email protected] for a trial and check out our open source project - https://github.com/Haste171/langchain-chatbot
Project mention: I am new to language models but I want to create a knowledge base upon a bunch of files so that I can ask questions and get answers back. | /r/LocalLLaMA | 2023-06-18For started, you can use gustavz/DataChad: Ask questions about any data source by leveraging langchains (github.com)
Python Embeddings related posts
- HuggingFace text-generation-inference is reverting to Apache 2.0 License
- Ask HN: What are some of the best user experiences with AI?
- Show HN: LLMWare β Small Specialized Function Calling 1B LLMs for Multi-Step RAG
- AI Grant Traction in OSS Startups
- Show HN: LLMWare β Integrated Solution for RAG in Finance and Legal
- Vector Databases: A Technical Primer [pdf]
-
privateGPT VS quivr - a user suggested alternative
2 projects | 12 Jan 2024
-
A note from our sponsor - InfluxDB
www.influxdata.com | 24 Apr 2024
Index
What are some of the best open-source Embedding projects in Python? This list will help you:
Project | Stars | |
---|---|---|
1 | chroma | 12,189 |
2 | txtai | 6,953 |
3 | pytorch-metric-learning | 5,754 |
4 | hub | 3,436 |
5 | llmware | 3,086 |
6 | towhee | 2,970 |
7 | lightly | 2,741 |
8 | GPTDiscord | 1,780 |
9 | instructor-embedding | 1,695 |
10 | magnitude | 1,610 |
11 | eda_nlp | 1,536 |
12 | contextualized-topic-models | 1,157 |
13 | SeaGOAT | 906 |
14 | NeumAI | 774 |
15 | fastembed | 759 |
16 | PolyFuzz | 713 |
17 | vectorflow | 634 |
18 | langchain-chatbot | 371 |
19 | AnglE | 341 |
20 | jodie | 333 |
21 | osintgpt | 323 |
22 | DataChad | 301 |
23 | pyRDF2Vec | 240 |
Sponsored