Revolutionize your code reviews with AI. CodeRabbit offers PR summaries, code walkthroughs, 1-click suggestions, and AST-based analysis. Boost productivity and code quality across all major languages with each PR. Learn more →
Top 13 Python sentence-embedding Projects
-
txtai
💡 All-in-one open-source embeddings database for semantic search, LLM orchestration and language model workflows
-
Nutrient
Nutrient - The #1 PDF SDK Library. Bad PDFs = bad UX. Slow load times, broken annotations, clunky UX frustrates users. Nutrient’s PDF SDKs gives seamless document experiences, fast rendering, annotations, real-time collaboration, 100+ features. Used by 10K+ devs, serving ~half a billion users worldwide. Explore the SDK for free.
-
Choosing the right embedding model is equally important for effective semantic matching of queries and chunk blocks. To select the appropriate open-source embedding model, the authors conducted another experiment using the evaluation module of FlagEmbedding, which uses the dataset namespace-Pt/msmarco7 for queries and the dataset namespace-Pt/msmarco-corpus8 for the corpus and metrics like RR and MRR were used for evaluation.
-
-
SimCSE
[EMNLP 2021] SimCSE: Simple Contrastive Learning of Sentence Embeddings https://arxiv.org/abs/2104.08821
-
nlu
1 line for thousands of State of The Art NLP models in hundreds of languages The fastest and most accurate way to solve text problems.
-
inltk
Natural Language Toolkit for Indic Languages aims to provide out of the box support for various NLP tasks that an application developer might need
-
-
CodeRabbit
CodeRabbit: AI Code Reviews for Developers. Revolutionize your code reviews with AI. CodeRabbit offers PR summaries, code walkthroughs, 1-click suggestions, and AST-based analysis. Boost productivity and code quality across all major languages with each PR.
-
AnglE
Train and Infer Powerful Sentence Embeddings with AnglE | 🔥 SOTA on STS and MTEB Leaderboard (by SeanLee97)
-
DiffCSE
Code for the NAACL 2022 long paper "DiffCSE: Difference-based Contrastive Learning for Sentence Embeddings"
-
PromCSE
[EMNLP 2022] Improved Universal Sentence Embeddings with Prompt-based Contrastive Learning and Energy-based Learning
-
-
AnnA_Anki_neuronal_Appendix
Using machine learning on your anki collection to enhance the scheduling via semantic clustering and semantic similarity
Project mention: Ask HN: Is there any software you only made for your own use but nobody else? | news.ycombinator.com | 2024-07-04 -
Python sentence-embeddings discussion
Python sentence-embeddings related posts
-
Understanding RAG (Part 5): Recommendations and wrap-up
-
how can a top2vec output be improved
-
You probably shouldn't use OpenAI's embeddings
-
SBERT Embeddings from Conversations
-
BERT-Based Clustering on a Corpus of Genre Samples Kinda Sucks. Why?
-
Sentence transformers (BERTopic) on a Macbook Air
-
Comparing BERTopic to human raters
-
A note from our sponsor - CodeRabbit
coderabbit.ai | 19 Feb 2025
Index
What are some of the best open-source sentence-embedding projects in Python? This list will help you:
# | Project | Stars |
---|---|---|
1 | txtai | 10,349 |
2 | FlagEmbedding | 8,466 |
3 | BERTopic | 6,414 |
4 | SimCSE | 3,496 |
5 | nlu | 893 |
6 | inltk | 826 |
7 | vectordb | 588 |
8 | AnglE | 513 |
9 | DiffCSE | 293 |
10 | PromCSE | 135 |
11 | simple-simcse | 76 |
12 | AnnA_Anki_neuronal_Appendix | 63 |
13 | smaller-labse | 18 |