Top 3 Python duplicate-detection Projects
-
videohash
Near Duplicate Video Detection (Perceptual Video Hashing) - Get a 64-bit comparable hash-value for any video.
Project mention: videohash / video fingerprinting Question : Detecting if a small clip is part of a longer movie | /r/learnprogramming | 2023-06-27Hi all, I try to create a program to detect if a certain video scene (normally within 10 seconds) is within a longer video file. The idea is that if I find an scene on youtube, I want to know from which episodes of a particular TV show (assuming I know which tv show, but no idea which episode), so I want to find it out. Current solution: [a] - Extract Frame using ffmpeg from the reference clip (fps = 1) [b] - Extract Frame using ffmpeg from the longer video file (fps around 0.1 or 0.5) For each frame from [a] , I do a imagehash for [a] and [b] and comparing the hamming distance, get the lowest distance from this round of comparision and move on to the next frame from [a] Eventually I got an average score and I can find out if this TV episode contain the scene I was looking for. However, this is slow and not efficient. I found out that there is a videohash library https://github.com/akamhy/videohash But it said "Videohash cannot be used to verify whether one video is a part of another (video fingerprinting)." Does anybody know why? Is it because it gets a videohash for the whole video? If this is the case, how about I use the video hash lib to create a hash for my reference clip (let's say it is about 10 seconds) and then I create multiple 10-second version of the Longer video, generate a videohash just for it and compared that with my reference clip. Would that work? (Yes I understand that for a 60 minutes movie, that would be like 360 video hash to be calculated)... Do you think this is better? Thanks.
-
dude
Duplicates Detector is a cross-platform GUI utility for finding duplicate files, allowing you to delete or link them to save space. Duplicate files are displayed and processed on two synchronized panels for efficient and convenient operation. (by PJDude)
Hi. I recommend my little program, the bottleneck is the gui in tkinter, but maybe it will be useful to someone:
-
WorkOS
The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
-
DupCatch
This tool is built to find duplicates in anki cards that are not identified by the built in Anki 'find duplicates' function
Python duplicate-detection related posts
Index
What are some of the best open-source duplicate-detection projects in Python? This list will help you:
Project | Stars | |
---|---|---|
1 | videohash | 250 |
2 | dude | 51 |
3 | DupCatch | 0 |