SaaSHub helps you find the best software and product alternatives Learn more →
Top 20 similarity Open-Source Projects
-
fastdup
fastdup is a powerful free tool designed to rapidly extract valuable insights from your image & video datasets. Assisting you to increase your dataset images & labels quality and reduce your data operations costs at an unparalleled scale.
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
-
python-string-similarity
A library implementing different string similarity and distance measures using Python.
-
CogCompNLP
CogComp's Natural Language Processing Libraries and Demos: Modules include lemmatizer, ner, pos, prep-srl, quantifier, question type, relation-extraction, similarity, temporal normalizer, tokenizer, transliteration, verb-sense, and more.
-
Duplicate-Image-Finder
difPy - Python package for finding duplicate or similar images within folders
-
WorkOS
The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
-
textdistance.rs
🦀📏 Rust library to compare strings (or any sequences). 25+ algorithms, pure Rust, common interface, Unicode support.
-
ml-classify-text-js
Machine learning based text classification in JavaScript using n-grams and cosine similarity
-
unisim
UniSim is a package for efficient similarity computation, fuzzy matching, and clustering of data.
-
pysimilar
A python library for computing the similarity between two strings (text) based on cosine similarity
-
pseqsid
Calculates pairwise sequence identity, similarity and normalized similarity score of proteins in a multiple sequence alignment.
-
SemanticValidation
SemanticValidation is a library that integrates OpenAI’s powerful language models with validation systems. It allows you to perform semantic checks on your data and queries using natural language understanding.
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
Visualizing your dataset (especially large ones) in a low-dimensional embedding space can tell you a lot about the patterns and clusters in your dataset.
We recently release a notebook showing how you can visualize your dataset using DINOv2 models by running it on your CPU.
Yes! No GPUs needed.
We used it to find clusters of similar images, duplicates, and outliers in a subset of the LAION dataset
Try it on your own dataset:
Colab notebook - https://colab.research.google.com/github/visual-layer/fastdup/blob/main/examples/dinov2_notebook.ipynb
GitHub repo - https://github.com/visual-layer/fastdup
For Postgres, there is an extension that provides that.
Project mention: textdistance.rs: Rust library to compare strings (or any sequences). 25+ algorithms, pure Rust, common interface, Unicode support. Based on popular and battle-tested textdistance Python library. | /r/rust | 2023-05-19
Project mention: Google UniSim for efficient similarity computation | news.ycombinator.com | 2023-11-30
Project mention: Dice-coefficient: Doubled speed by not using an array | news.ycombinator.com | 2023-07-14
Project mention: Postgres Extension to Calculate the Maximal Information Coefficient (Mic) | news.ycombinator.com | 2024-04-10
Week 4: 🪞Image Deduplication
While you can do this now with the SemanticValidation library, I'm going to introduce an even simpler way in this post: using the skUnit library for semantic unit testing. Sounds exciting, right?
similarity related posts
- Postgres Extension to Calculate the Maximal Information Coefficient (Mic)
- Data Cleaning in SQL
- Show HN: Supabase Clippy – ChatGPT for Supabase Docs
- PHash – the open source perceptual hash library
- Official Elasticsearch Python library no longer works with open-source forks
- Aurora: an open source Automated malware similarity platform with modularity in mind.
- Speeding up edit distance calculation process
-
A note from our sponsor - SaaSHub
www.saashub.com | 25 Apr 2024
Index
What are some of the best open-source similarity projects? This list will help you:
Project | Stars | |
---|---|---|
1 | fastdup | 1,403 |
2 | dssim | 1,036 |
3 | python-string-similarity | 946 |
4 | pHash | 508 |
5 | CogCompNLP | 469 |
6 | Duplicate-Image-Finder | 390 |
7 | pg_similarity | 352 |
8 | textdistance.rs | 254 |
9 | synt | 181 |
10 | ml-classify-text-js | 107 |
11 | aurora | 74 |
12 | unisim | 63 |
13 | simetric | 60 |
14 | gaoya | 48 |
15 | dice-coefficient | 48 |
16 | pysimilar | 19 |
17 | vasco | 20 |
18 | pseqsid | 11 |
19 | image-deduplication-plugin | 8 |
20 | SemanticValidation | 7 |
Sponsored