Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality. Learn more →
Top 16 sentence-embedding Open-Source Projects
-
txtai
💡 All-in-one open-source embeddings database for semantic search, LLM orchestration and language model workflows
txtai is an all-in-one embeddings database for semantic search, LLM orchestration and language model workflows.
-
Try experimenting with different hyperparameters, clustering algorithms and embedding representations. Try https://github.com/MaartenGr/BERTopic/tree/master/bertopic
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
-
SimCSE
[EMNLP 2021] SimCSE: Simple Contrastive Learning of Sentence Embeddings https://arxiv.org/abs/2104.08821
-
inltk
Natural Language Toolkit for Indic Languages aims to provide out of the box support for various NLP tasks that an application developer might need
-
nlu
1 line for thousands of State of The Art NLP models in hundreds of languages The fastest and most accurate way to solve text problems.
-
You can find some comparisons and evaluation datasets/tasks here: https://www.sbert.net/docs/pretrained_models.html
Generally MiniLM is a good baseline. For faster models you want this library:
https://github.com/oborchers/Fast_Sentence_Embeddings
For higher quality ones, just take the bigger/slower models in the SentenceTransformers library
-
-
WorkOS
The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
-
-
awesome-semantic-search
A curated list of awesome resources related to Semantic Search🔎 and Semantic Similarity tasks.
-
DiffCSE
Code for the NAACL 2022 long paper "DiffCSE: Difference-based Contrastive Learning for Sentence Embeddings"
-
PromCSE
Code for "Improved Universal Sentence Embeddings with Prompt-based Contrastive Learning and Energy-based Learning (EMNLP 2022)"
-
-
AnnA_Anki_neuronal_Appendix
Using machine learning on your anki collection to enhance the scheduling via semantic clustering and semantic similarity
-
energetic-ai
EnergeticAI is TensorFlow.js, optimized for serverless environments, with fast cold-start, small module size, and pre-trained models.
Project mention: EnergeticAI - TensorFlow.js, optimized for serverless Node.js environments | /r/aipromptprogramming | 2023-06-14 -
-
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
sentence-embeddings related posts
- how can a top2vec output be improved
- You probably shouldn't use OpenAI's embeddings
- SBERT Embeddings from Conversations
- BERT-Based Clustering on a Corpus of Genre Samples Kinda Sucks. Why?
- Sentence transformers (BERTopic) on a Macbook Air
- Comparing BERTopic to human raters
- text clustering with XLNET, ROBERTA, ELMO and other pretrained models
-
A note from our sponsor - InfluxDB
www.influxdata.com | 28 Mar 2024
Index
What are some of the best open-source sentence-embedding projects? This list will help you:
Project | Stars | |
---|---|---|
1 | txtai | 6,725 |
2 | BERTopic | 5,442 |
3 | SimCSE | 3,207 |
4 | inltk | 809 |
5 | nlu | 801 |
6 | Fast_Sentence_Embeddings | 603 |
7 | vectordb | 448 |
8 | AnglE | 323 |
9 | awesome-semantic-search | 318 |
10 | DiffCSE | 279 |
11 | PromCSE | 127 |
12 | simple-simcse | 57 |
13 | AnnA_Anki_neuronal_Appendix | 55 |
14 | energetic-ai | 31 |
15 | smaller-labse | 17 |
16 | theGoodWord | 2 |