Python sentence-embeddings

Open-source Python projects categorized as sentence-embeddings

Top 13 Python sentence-embedding Projects

sentence-embeddings
  1. txtai

    💡 All-in-one open-source embeddings database for semantic search, LLM orchestration and language model workflows

    Project mention: Chunking your data for RAG | dev.to | 2025-02-11
  2. Nutrient

    Nutrient - The #1 PDF SDK Library. Bad PDFs = bad UX. Slow load times, broken annotations, clunky UX frustrates users. Nutrient’s PDF SDKs gives seamless document experiences, fast rendering, annotations, real-time collaboration, 100+ features. Used by 10K+ devs, serving ~half a billion users worldwide. Explore the SDK for free.

    Nutrient logo
  3. FlagEmbedding

    Retrieval and Retrieval-augmented LLMs

    Project mention: Understanding RAG (Part 5): Recommendations and wrap-up | dev.to | 2024-09-09

    Choosing the right embedding model is equally important for effective semantic matching of queries and chunk blocks. To select the appropriate open-source embedding model, the authors conducted another experiment using the evaluation module of FlagEmbedding, which uses the dataset namespace-Pt/msmarco7 for queries and the dataset namespace-Pt/msmarco-corpus8 for the corpus and metrics like RR and MRR were used for evaluation.

  4. BERTopic

    Leveraging BERT and c-TF-IDF to create easily interpretable topics.

  5. SimCSE

    [EMNLP 2021] SimCSE: Simple Contrastive Learning of Sentence Embeddings https://arxiv.org/abs/2104.08821

  6. nlu

    1 line for thousands of State of The Art NLP models in hundreds of languages The fastest and most accurate way to solve text problems.

  7. inltk

    Natural Language Toolkit for Indic Languages aims to provide out of the box support for various NLP tasks that an application developer might need

  8. vectordb

    A Python vector database you just need - no more, no less. (by jina-ai)

  9. CodeRabbit

    CodeRabbit: AI Code Reviews for Developers. Revolutionize your code reviews with AI. CodeRabbit offers PR summaries, code walkthroughs, 1-click suggestions, and AST-based analysis. Boost productivity and code quality across all major languages with each PR.

    CodeRabbit logo
  10. AnglE

    Train and Infer Powerful Sentence Embeddings with AnglE | 🔥 SOTA on STS and MTEB Leaderboard (by SeanLee97)

  11. DiffCSE

    Code for the NAACL 2022 long paper "DiffCSE: Difference-based Contrastive Learning for Sentence Embeddings"

  12. PromCSE

    [EMNLP 2022] Improved Universal Sentence Embeddings with Prompt-based Contrastive Learning and Energy-based Learning

  13. simple-simcse

    A simple implementation of SimCSE

  14. AnnA_Anki_neuronal_Appendix

    Using machine learning on your anki collection to enhance the scheduling via semantic clustering and semantic similarity

    Project mention: Ask HN: Is there any software you only made for your own use but nobody else? | news.ycombinator.com | 2024-07-04
  15. smaller-labse

    Applying "Load What You Need: Smaller Versions of Multilingual BERT" to LaBSE

NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020).

Python sentence-embeddings discussion

Log in or Post with

Python sentence-embeddings related posts

  • Understanding RAG (Part 5): Recommendations and wrap-up

    1 project | dev.to | 9 Sep 2024
  • how can a top2vec output be improved

    1 project | /r/learnmachinelearning | 4 Jul 2023
  • You probably shouldn't use OpenAI's embeddings

    5 projects | news.ycombinator.com | 30 Mar 2023
  • SBERT Embeddings from Conversations

    2 projects | /r/LanguageTechnology | 3 Mar 2023
  • BERT-Based Clustering on a Corpus of Genre Samples Kinda Sucks. Why?

    1 project | /r/LanguageTechnology | 19 Feb 2023
  • Sentence transformers (BERTopic) on a Macbook Air

    1 project | /r/datascience | 13 Feb 2023
  • Comparing BERTopic to human raters

    1 project | /r/LanguageTechnology | 3 Feb 2023
  • A note from our sponsor - CodeRabbit
    coderabbit.ai | 19 Feb 2025
    Revolutionize your code reviews with AI. CodeRabbit offers PR summaries, code walkthroughs, 1-click suggestions, and AST-based analysis. Boost productivity and code quality across all major languages with each PR. Learn more →

Index

What are some of the best open-source sentence-embedding projects in Python? This list will help you:

# Project Stars
1 txtai 10,349
2 FlagEmbedding 8,466
3 BERTopic 6,414
4 SimCSE 3,496
5 nlu 893
6 inltk 826
7 vectordb 588
8 AnglE 513
9 DiffCSE 293
10 PromCSE 135
11 simple-simcse 76
12 AnnA_Anki_neuronal_Appendix 63
13 smaller-labse 18

Sponsored
Nutrient - The #1 PDF SDK Library
Bad PDFs = bad UX. Slow load times, broken annotations, clunky UX frustrates users. Nutrient’s PDF SDKs gives seamless document experiences, fast rendering, annotations, real-time collaboration, 100+ features. Used by 10K+ devs, serving ~half a billion users worldwide. Explore the SDK for free.
nutrient.io

Did you know that Python is
the 2nd most popular programming language
based on number of references?