Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality. Learn more →
Top 12 duplicate-detection Open-Source Projects
-
duplicut
Remove duplicates from MASSIVE wordlist, without sorting it (for dictionary-based password cracking)
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
-
videohash
Near Duplicate Video Detection (Perceptual Video Hashing) - Get a 64-bit comparable hash-value for any video.
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
-
cbird
Command-line program for managing a media collection, with focus on Content-Based Image Retrieval (Computer Vision) methods for finding duplicates.
-
dude
Duplicates Detector is a cross-platform GUI utility for finding duplicate files, allowing you to delete or link them to save space. Duplicate files are displayed and processed on two synchronized panels for efficient and convenient operation. (by PJDude)
-
photodedupe
A utility for locating near duplicate photos irrespective of image resolution, compression settings or file format.
-
DupCatch
This tool is built to find duplicates in anki cards that are not identified by the built in Anki 'find duplicates' function
Project mention: videohash / video fingerprinting Question : Detecting if a small clip is part of a longer movie | /r/learnprogramming | 2023-06-27Hi all, I try to create a program to detect if a certain video scene (normally within 10 seconds) is within a longer video file. The idea is that if I find an scene on youtube, I want to know from which episodes of a particular TV show (assuming I know which tv show, but no idea which episode), so I want to find it out. Current solution: [a] - Extract Frame using ffmpeg from the reference clip (fps = 1) [b] - Extract Frame using ffmpeg from the longer video file (fps around 0.1 or 0.5) For each frame from [a] , I do a imagehash for [a] and [b] and comparing the hamming distance, get the lowest distance from this round of comparision and move on to the next frame from [a] Eventually I got an average score and I can find out if this TV episode contain the scene I was looking for. However, this is slow and not efficient. I found out that there is a videohash library https://github.com/akamhy/videohash But it said "Videohash cannot be used to verify whether one video is a part of another (video fingerprinting)." Does anybody know why? Is it because it gets a videohash for the whole video? If this is the case, how about I use the video hash lib to create a hash for my reference clip (let's say it is about 10 seconds) and then I create multiple 10-second version of the Longer video, generate a videohash just for it and compared that with my reference clip. Would that work? (Yes I understand that for a 60 minutes movie, that would be like 360 video hash to be calculated)... Do you think this is better? Thanks.
Project mention: Show HN: Pyzam, Shazam for DJs and Mixtapes in Python | news.ycombinator.com | 2024-04-24Hello, really glad to see project like this popping up. I have few questions as I was working on something similar few years ago:
1. I did some development myself for a "Track Discovery for Djs"[1] project in this space of "dj music recognition" and I am wondering how are you able to handle mixtapes and dj mixes when there is a significant element of sound manipulation/distortion applied, like pitch/tempo + various effects? In my tests this totally confused the algorithms which were not designed to handle such cases.
2. Can you share which algorithm have you implemented for this project? I did read most of the research papers in this space and my preferred solution was to build upon https://github.com/JorenSix/Panako which I did.
In the space of "minimal microhouse techno" type of genre where there are often similar rhythm patterns or even tracks build up using same sample packs it proved to be more difficult to have reliable results than not.
I was investigating how Spotify and other market leaders can do track recognition and they do train ML models on the same track which has applied 100+ various different effects...
Curious to hear your thoughts...
[1] - https://rominimal.club
Project mention: need help in implementing siamese network with triplet loss for predicting duplicate tickets . | /r/learnmachinelearning | 2023-05-26source datasets -https://github.com/logpai/bugrepo
Try cbird (my program). In addition to playing with thresholds, there are a couple of options that might help. Also if you can share the screenshots I'd be interested in testing.
Hi. I recommend my little program, the bottleneck is the gui in tkinter, but maybe it will be useful to someone:
https://github.com/PJDude/dude
That focuses on exact duplicates through hashing. I found https://github.com/InexplicableMagic/photodedupe to be helpful for finding near duplicates in images through LSH.
duplicate-detection related posts
-
Show HN: Pyzam, Shazam for DJs and Mixtapes in Python
-
Similar photo finder for screenshots
-
videohash / video fingerprinting Question : Detecting if a small clip is part of a longer movie
-
Video File Deduplication and Indexing/Sorting Software?
-
cbird Visual Deduplicator v0.6 Update
-
A note from our sponsor - InfluxDB
www.influxdata.com | 1 May 2024
Index
What are some of the best open-source duplicate-detection projects? This list will help you:
Project | Stars | |
---|---|---|
1 | duplicut | 777 |
2 | depp | 266 |
3 | videohash | 257 |
4 | deduplicator | 254 |
5 | Panako | 174 |
6 | bughub | 109 |
7 | removedupes | 78 |
8 | cbird | 72 |
9 | dude | 54 |
10 | photodedupe | 16 |
11 | samanlainen | 6 |
12 | DupCatch | 0 |
Sponsored