SaaSHub helps you find the best software and product alternatives Learn more →
Top 21 Jupyter Notebook Embedding Projects
-
awesome-generative-ai
A curated list of Generative AI tools, works, models, and references (by filipecalegario)
Project mention: Top Courses and GitHub Repositories to Learn GenerativeAI Free | dev.to | 2024-08-17✅Filipecalegario
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
-
Featureform The success of a machine learning model relies on the quality of data and, hence, the features fed to the model. However, in large organizations, members of one team may not be aware of good features developed by other teams in the organization. A feature store helps eliminate this problem by providing a central repository of features that are accessible to all the teams and individuals within an organization.
-
Google AI
-
-
superlinked
Superlinked is a Python framework for AI Engineers building high-performance search & recommendation applications that combine structured and unstructured data.
Project mention: Show HN: Superlinked – Vector Embeddings for Structured and Unstructured Data | news.ycombinator.com | 2024-12-02 -
amazon-bedrock-samples
This repository contains examples for customers to get started using the Amazon Bedrock Service. This contains examples for all available foundational models
AWS Samples contains pre-built examples to help customers get started with the Amazon Bedrock service.
-
-
This is a great guide.
Also - despite the fact that language model embedding [1] are currently the hot rage, good old embedding models are more than good enough for most tasks.
With just a bit of tuning, they're generally as good at many sentence embedding tasks [2], and with good libraries [3] you're getting something like 400k sentence/sec on laptop CPU versus ~4k-15k sentences/sec on a v100 for LM embeddings.
When you should use language model embeddings:
- Multilingual tasks. While some embedding models are multilingual aligned (eg. MUSE [4]), you still need to route the sentence to the correct embedding model file (you need something like langdetect). It's also cumbersome, with one 400mb file per language.
For LM embedding models, many are multilingual aligned right away.
- Tasks that are very context specific or require fine-tuning. For instance, if you're making a RAG system for medical documents, the embedding space is best when it creates larger deviations for the difference between seemingly-related medical words.
This means models with more embedding dimensions, and heavily favors LM models over classic embedding models.
1. sbert.net
2. https://collaborate.princeton.edu/en/publications/a-simple-b...
3. https://github.com/oborchers/Fast_Sentence_Embeddings
4. https://github.com/facebookresearch/MUSE
-
cleora
Cleora AI is a general-purpose model for efficient, scalable learning of stable and inductive entity embeddings for heterogeneous relational data.
-
examples
Analyze the unstructured data with Towhee, such as reverse image search, reverse video search, audio classification, question and answer systems, molecular search, etc. (by towhee-io)
Project mention: BMF: Frame extraction acceleration- video similarity search with Pinecone | dev.to | 2024-05-10! curl -L https://github.com/towhee-io/examples/releases/download/data/reverse_video_search.zip -O ! unzip -q -o reverse_video_search.zip
-
-
-
-
entity-embed
PyTorch library for transforming entities like companies, products, etc. into vectors to support scalable Record Linkage / Entity Resolution using Approximate Nearest Neighbors.
-
embedding-encoder
Scikit-Learn compatible transformer that turns categorical variables into dense entity embeddings.
-
langchain-embeddings
This repository demonstrates the construction of a state-of-the-art multimodal search engine, leveraging Amazon Titan Embeddings, Amazon Bedrock, and LangChain.
Project mention: Desplegando una Aplicación de Embeddings Serverless con AWS CDK, Lambda y Amazon Aurora PostgreSQL | dev.to | 2024-09-18To generate embeddings for image/pdf with pgvector and Amazon Aurora.
-
Project mention: Battle of the Semantics – GraphRag vs. Embeddings | news.ycombinator.com | 2024-07-10
-
vector-search-azure-cosmos-db-postgresql
This sample shows how to build vector similarity search on Azure Cosmos DB for PostgreSQL using the pgvector extension and the multi-modal embeddings APIs of Azure AI Vision.
Project mention: Use HNSW index on Azure Cosmos DB for PostgreSQL for similarity search | dev.to | 2024-03-14In the Jupyter Notebook provided on my GitHub repository, you'll explore text-to-image and image-to-image search scenarios. You will use the same text prompts and reference images as in the Exact Nearest Neighbors search example, allowing for a comparison of the accuracy of the results.
-
-
-
tax-retrieval-benchmark
An implementation of the TaxRetrievalBenchmark task for the 🤗 Massive Text Embedding Benchmark (MTEB) framework.
Project mention: Integrating the French Taxation Embedding Benchmark Task (Beta) into the MTEB | news.ycombinator.com | 2024-05-26
Jupyter Notebook Embeddings discussion
Jupyter Notebook Embeddings related posts
-
A Journey of GenAI with AWS Bedrock based sample Images
-
Road to becoming a GDE | The Google Developers Program
-
What Are Embeddings?
-
Meet Stache Forcache, a Movember-themed AI created using Amazon PartyRock
-
AWS is killing customer AI apps without warning
-
Building an AI-Powered iOS Chat App with Amazon Bedrock and Swift
-
البدء مع نماذج اللغة الكبيرة: كيف يمكن لـ Amazon Bedrock تعزيز رحلتك في الذكاء الاصطناعي
-
A note from our sponsor - SaaSHub
www.saashub.com | 19 Jan 2025
Index
What are some of the best open-source Embedding projects in Jupyter Notebook? This list will help you:
# | Project | Stars |
---|---|---|
1 | awesome-generative-ai | 2,652 |
2 | featureform | 1,826 |
3 | generative-ai-docs | 1,806 |
4 | what_are_embeddings | 988 |
5 | superlinked | 846 |
6 | amazon-bedrock-samples | 694 |
7 | vectordb-recipes | 667 |
8 | Fast_Sentence_Embeddings | 618 |
9 | cleora | 487 |
10 | examples | 471 |
11 | kgtk | 368 |
12 | beyondllm | 270 |
13 | Research2Vec | 198 |
14 | entity-embed | 147 |
15 | embedding-encoder | 41 |
16 | langchain-embeddings | 22 |
17 | battle-of-the-semantics | 13 |
18 | vector-search-azure-cosmos-db-postgresql | 10 |
19 | emotion-classifier | 6 |
20 | ml | 2 |
21 | tax-retrieval-benchmark | 1 |