Dedupe

Open-source projects categorized as Dedupe Edit details

Top 13 Dedupe Open-Source Projects

  • restic

    Fast, secure, efficient backup program

    Project mention: I built my first NAS, and it was way easier than I expected. | reddit.com/r/DataHoarder | 2022-10-04
  • BorgBackup

    Deduplicating archiver with compression and authenticated encryption.

    Project mention: Recommend backup program? | reddit.com/r/archlinux | 2022-10-05

    borg

  • talent.io

    Download talent.io’s Tech Salary Report. Median salaries, most in-demand technologies, state of the remote work... all you need to know your worth on the market by tech recruitment platform talent.io

  • dedupe

    :id: A python library for accurate and scalable fuzzy matching, record deduplication and entity-resolution.

    Project mention: Entity Resolution with Magniv | news.ycombinator.com | 2022-06-01
  • yarn-deduplicate

    Deduplication tool for yarn.lock files

    Project mention: Cannot find module 'error/typed' | dev.to | 2022-02-11

    Often times the cruft accumulated in yarn.lock makes the whole project go boom. yarn-deduplicate can potentially help with this. Seen this in few instances, but(!) this did not fix the problem in my case.

  • jdupes

    A powerful duplicate file finder and an enhanced fork of 'fdupes'.

    Project mention: Tools to find duplicate files on muliple file servers | reddit.com/r/sysadmin | 2022-10-04

    Jdupes. There are precompiled 64-bit and 32-bit Win32 packages, and Linux is supported. It's a fork of fdupes, which was Linux/Unix only.

  • zingg

    Scalable identity resolution, entity resolution, data mastering and deduplication using ML

    Project mention: How to find open source data science python projects to contribute to? | reddit.com/r/datascience | 2022-08-15

    Check https://github.com/zinggAI/zingg/. We recently added Python to our stack and are looking for help with building dbt-zingg python models, databricks-zingg python notebooks, python api, building a python based front end etc.

  • duplicut

    Remove duplicates from MASSIVE wordlist, without sorting it (for dictionary-based password cracking)

    Project mention: Awesome Penetration Testing | dev.to | 2021-10-06

    duplicut - Quickly remove duplicates, without changing the order, and without getting OOM on huge wordlists.

  • Scout APM

    Truly a developer’s best friend. Scout APM is great for developers who want to find and fix performance issues in their applications. With Scout, we'll take care of the bugs so you can focus on building great things 🚀.

  • bees

    Best-Effort Extent-Same, a btrfs dedupe agent

    Project mention: Is Bees a after-solution to BTRFS defragmentation breaking reflinks ? | reddit.com/r/btrfs | 2022-08-23
  • imgdupes

    Finding and deleting near-duplicate images based on perceptual hash.

    Project mention: Merge 2 Image Libraries | reddit.com/r/DataHoarder | 2022-01-17
  • dupe-krill

    A fast file deduplicator

    Project mention: Fclones – an efficient duplicate file finder and remover | news.ycombinator.com | 2022-05-06

    Coincidentally I was looking through the list of apps that do this the other day, I think I looked at all the other ones they list as competitors.

    My particular use case involved images that might be very near duplicates (screenshots of the same web page), which some of the tools cover, though it feels like a slightly seperate task from the exact bit duplicates, so not all do it.

    One interesting one I found that wasn't listed in the Readme was:

    https://github.com/kornelski/dupe-krill

    Which had some notes about their use of BTreehashes to progressively compare files. Not sure how much difference it makes in practice but sounded elegant.

  • dduper

    Fast block-level out-of-band BTRFS deduplication tool.

    Project mention: Can I view the internal hash values for files? | reddit.com/r/btrfs | 2022-05-25

    ddupper uses a patched btrfs command to read file hases from the raw disk. It requires root access and is kind of a hack.

  • swuniq

    A command-line tool for deduplicating entries in a file or stream with constant memory usage

  • Deduper

    The goal of this project is to make a deduper program that anybody can run on their computer to save storage space. (by ThatOneShortGuy)

NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020). The latest post mention was on 2022-10-05.

Dedupe related posts

Index

What are some of the best open-source Dedupe projects? This list will help you:

Project Stars
1 restic 18,111
2 BorgBackup 8,693
3 dedupe 3,520
4 yarn-deduplicate 1,240
5 jdupes 1,207
6 zingg 609
7 duplicut 603
8 bees 358
9 imgdupes 237
10 dupe-krill 147
11 dduper 126
12 swuniq 2
13 Deduper 0
Find remote jobs at our new job board 99remotejobs.com. There are 8 new remote jobs listed recently.
Are you hiring? Post a new remote job listing for free.
Build time-series-based applications quickly and at scale.
InfluxDB is the Time Series Data Platform where developers build real-time applications for analytics, IoT and cloud-native services in less time with less code.
www.influxdata.com