similarity

Top 20 similarity Open-Source Projects

  • fastdup

    fastdup is a powerful free tool designed to rapidly extract valuable insights from your image & video datasets. Assisting you to increase your dataset images & labels quality and reduce your data operations costs at an unparalleled scale.

  • Project mention: Visualize your dataset using DINOv2 embedding | news.ycombinator.com | 2023-05-02

    Visualizing your dataset (especially large ones) in a low-dimensional embedding space can tell you a lot about the patterns and clusters in your dataset.

    We recently release a notebook showing how you can visualize your dataset using DINOv2 models by running it on your CPU.

    Yes! No GPUs needed.

    We used it to find clusters of similar images, duplicates, and outliers in a subset of the LAION dataset

    Try it on your own dataset:

    Colab notebook - https://colab.research.google.com/github/visual-layer/fastdup/blob/main/examples/dinov2_notebook.ipynb

    GitHub repo - https://github.com/visual-layer/fastdup

  • dssim

    Image similarity comparison simulating human perception (multiscale SSIM in Rust)

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • python-string-similarity

    A library implementing different string similarity and distance measures using Python.

  • pHash

    pHash - the open source perceptual hash library

  • CogCompNLP

    CogComp's Natural Language Processing Libraries and Demos: Modules include lemmatizer, ner, pos, prep-srl, quantifier, question type, relation-extraction, similarity, temporal normalizer, tokenizer, transliteration, verb-sense, and more.

  • Duplicate-Image-Finder

    difPy - Python package for finding duplicate or similar images within folders

  • pg_similarity

    set of functions and operators for executing similarity queries

  • Project mention: Data Cleaning in SQL | /r/SQL | 2023-06-15

    For Postgres, there is an extension that provides that.

  • WorkOS

    The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.

    WorkOS logo
  • textdistance.rs

    🦀📏 Rust library to compare strings (or any sequences). 25+ algorithms, pure Rust, common interface, Unicode support.

  • Project mention: textdistance.rs: Rust library to compare strings (or any sequences). 25+ algorithms, pure Rust, common interface, Unicode support. Based on popular and battle-tested textdistance Python library. | /r/rust | 2023-05-19
  • synt

    Find similar functions and classes in your JavaScript/TypeScript code

  • ml-classify-text-js

    Machine learning based text classification in JavaScript using n-grams and cosine similarity

  • aurora

    Malware similarity platform with modularity in mind. (by W3ndige)

  • unisim

    UniSim is a package for efficient similarity computation, fuzzy matching, and clustering of data.

  • Project mention: Google UniSim for efficient similarity computation | news.ycombinator.com | 2023-11-30
  • simetric

    String similarity metrics for Elixir

  • gaoya

    Locality Sensitive Hashing

  • dice-coefficient

    Sørensen–Dice coefficient

  • Project mention: Dice-coefficient: Doubled speed by not using an array | news.ycombinator.com | 2023-07-14
  • pysimilar

    A python library for computing the similarity between two strings (text) based on cosine similarity

  • vasco

    vasco: MIC & MINE statistics for Postgres

  • Project mention: Postgres Extension to Calculate the Maximal Information Coefficient (Mic) | news.ycombinator.com | 2024-04-10
  • pseqsid

    Calculates pairwise sequence identity, similarity and normalized similarity score of proteins in a multiple sequence alignment.

  • image-deduplication-plugin

    Remove exact and approximate duplicates from your dataset in FiftyOne!

  • Project mention: Plugin for Building and Managing Plugins! | dev.to | 2024-02-09

    Week 4: 🪞Image Deduplication

  • SemanticValidation

    SemanticValidation is a library that integrates OpenAI’s powerful language models with validation systems. It allows you to perform semantic checks on your data and queries using natural language understanding.

  • Project mention: Semantic Tests for SemanticKernel Plugins using skUnit | dev.to | 2024-01-04

    While you can do this now with the SemanticValidation library, I'm going to introduce an even simpler way in this post: using the skUnit library for semantic unit testing. Sounds exciting, right?

  • SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

    SaaSHub logo
NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020).

similarity related posts

Index

What are some of the best open-source similarity projects? This list will help you:

Project Stars
1 fastdup 1,403
2 dssim 1,036
3 python-string-similarity 946
4 pHash 508
5 CogCompNLP 469
6 Duplicate-Image-Finder 390
7 pg_similarity 352
8 textdistance.rs 254
9 synt 181
10 ml-classify-text-js 107
11 aurora 74
12 unisim 63
13 simetric 60
14 gaoya 48
15 dice-coefficient 48
16 pysimilar 19
17 vasco 20
18 pseqsid 11
19 image-deduplication-plugin 8
20 SemanticValidation 7

Sponsored
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com