Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality. Learn more →
Top 15 Python Deduplication Projects
-
WorkOS
The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
-
splink
Fast, accurate and scalable probabilistic data linkage with support for multiple SQL backends
-
LSH
Locality Sensitive Hashing using MinHash in Python/Cython to detect near duplicate text documents
-
benji
Benji Backup: A block based deduplicating backup software for Ceph RBD images, iSCSI targets, image files and block devices
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
-
npbackup
A secure and efficient file backup solution that fits both system administrators (CLI) and end users (GUI)
-
unisim
UniSim is a package for efficient similarity computation, fuzzy matching, and clustering of data.
-
dude
Duplicates Detector is a cross-platform GUI utility for finding duplicate files, allowing you to delete or link them to save space. Duplicate files are displayed and processed on two synchronized panels for efficient and convenient operation. (by PJDude)
-
Neural-Scam-Artist
Web Scraping, Document Deduplication & GPT-2 Fine-tuning with a newly created scam dataset.
-
Deduper
The goal of this project is to make a deduper program that anybody can run on their computer to save storage space. (by ThatOneShortGuy)
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
Project mention: Ask HN: Open-source Windows 11 backup solutions | news.ycombinator.com | 2024-04-04i use - and recommend - "borgbackup": for example with the "vorta" graphical frontend
* https://www.borgbackup.org/
* https://vorta.borgbase.com/install/windows/
just my 0.02€
Dupeguru
- for important files, a separate box where I have borgmatic [1] in deduplication mode installed; this is updated once in a while
Just curious: Do you have any reason to believe that such a data corruption bug is likely in ZFS? It seems like saying that ext4 could have a bug and you should also store stuff on NTFS, just in case (which I think does not make sense..).
[1]: https://github.com/borgmatic-collective/borgmatic
Project mention: Splink: Fast, accurate, scalable probabilistic data linkage | news.ycombinator.com | 2024-03-13
Project mention: Google UniSim for efficient similarity computation | news.ycombinator.com | 2023-11-30
Hi. I recommend my little program, the bottleneck is the gui in tkinter, but maybe it will be useful to someone:
https://github.com/PJDude/dude
Week 4: 🪞Image Deduplication
Python Deduplication related posts
- Splink: Fast, accurate, scalable probabilistic data linkage
- I Backup
- Duplicity
- How to use onedrive for culling photos
- Does anyone know any freeware duplicate file checkers without an upsell similar to awesome duplicate photo finder?
- Kopia: Open-Source, Fast and Secure Open-Source Backup Software
- DupeGuru: Open-source, cross-platform GUI tool to find duplicate files
-
A note from our sponsor - InfluxDB
www.influxdata.com | 26 Apr 2024
Index
What are some of the best open-source Deduplication projects in Python? This list will help you:
Project | Stars | |
---|---|---|
1 | BorgBackup | 10,526 |
2 | dupeguru | 4,786 |
3 | borgmatic | 1,639 |
4 | splink | 1,086 |
5 | LSH | 272 |
6 | dduper | 162 |
7 | benji | 136 |
8 | npbackup | 121 |
9 | unisim | 64 |
10 | dude | 53 |
11 | Neural-Scam-Artist | 22 |
12 | dedup | 11 |
13 | image-deduplication-plugin | 8 |
14 | chunkdup | 1 |
15 | Deduper | 0 |
Sponsored