Python duplicate-detection

Open-source Python projects categorized as duplicate-detection

Top 3 Python duplicate-detection Projects

  • videohash

    Near Duplicate Video Detection (Perceptual Video Hashing) - Get a 64-bit comparable hash-value for any video.

    Project mention: videohash / video fingerprinting Question : Detecting if a small clip is part of a longer movie | /r/learnprogramming | 2023-06-27

    Hi all, I try to create a program to detect if a certain video scene (normally within 10 seconds) is within a longer video file. The idea is that if I find an scene on youtube, I want to know from which episodes of a particular TV show (assuming I know which tv show, but no idea which episode), so I want to find it out. Current solution: [a] - Extract Frame using ffmpeg from the reference clip (fps = 1) [b] - Extract Frame using ffmpeg from the longer video file (fps around 0.1 or 0.5) For each frame from [a] , I do a imagehash for [a] and [b] and comparing the hamming distance, get the lowest distance from this round of comparision and move on to the next frame from [a] Eventually I got an average score and I can find out if this TV episode contain the scene I was looking for. However, this is slow and not efficient. I found out that there is a videohash library https://github.com/akamhy/videohash But it said "Videohash cannot be used to verify whether one video is a part of another (video fingerprinting)." Does anybody know why? Is it because it gets a videohash for the whole video? If this is the case, how about I use the video hash lib to create a hash for my reference clip (let's say it is about 10 seconds) and then I create multiple 10-second version of the Longer video, generate a videohash just for it and compared that with my reference clip. Would that work? (Yes I understand that for a 60 minutes movie, that would be like 360 video hash to be calculated)... Do you think this is better? Thanks.

  • dude

    Duplicates Detector is a cross-platform GUI utility for finding duplicate files, allowing you to delete or link them to save space. Duplicate files are displayed and processed on two synchronized panels for efficient and convenient operation. (by PJDude)

    Project mention: fdupes: Identify or Delete Duplicate Files | news.ycombinator.com | 2023-11-02

    Hi. I recommend my little program, the bottleneck is the gui in tkinter, but maybe it will be useful to someone:

    https://github.com/PJDude/dude

  • WorkOS

    The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.

  • DupCatch

    This tool is built to find duplicates in anki cards that are not identified by the built in Anki 'find duplicates' function

NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020). The latest post mention was on 2023-11-02.

Python duplicate-detection related posts

Index

What are some of the best open-source duplicate-detection projects in Python? This list will help you:

Project Stars
1 videohash 250
2 dude 51
3 DupCatch 0
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com