Python Dedupe

Open-source Python projects categorized as Dedupe

Top 5 Python Dedupe Projects

  • BorgBackup

    Deduplicating archiver with compression and authenticated encryption.

    Project mention: Beware that BorgBackup 2.0 will soon come, breaking compatibility with current repositories and scripts. | reddit.com/r/linux | 2022-12-04

    Seriously, I'm having trouble seeing what the, presumably execution time, ram requirements and/or storage requirements for such a transfer are. Going through the documents: https://github.com/borgbackup/borg/blob/master/docs/usage/transfer.rst , I see pretty much the same description there. I assume that, at the very least, this requires crypto time and the storage space required to duplicate a backup, but is there more or less than I'm seeing? This doesn't seem to be an in-place conversion? Am I reading that correctly?

  • dedupe

    :id: A python library for accurate and scalable fuzzy matching, record deduplication and entity-resolution.

    Project mention: Model detects duplicate records | reddit.com/r/datascience | 2022-10-21

    Data deduplication is a super common problem, so it's useful experience to work on it. It's generally useful for companies, but I don't think it could be sold as a product unless is solving a very complicated, domain-specific de-duping problem. Otherwise, there are generic, open source de-duping tools such as: dedupe. It sounds like your model is similar to that.

  • Sonar

    Write Clean Python Code. Always.. Sonar helps you commit clean code every time. With over 225 unique rules to find Python bugs, code smells & vulnerabilities, Sonar finds the issues while you focus on the work.

  • imgdupes

    Finding and deleting near-duplicate images based on perceptual hash.

    Project mention: Merge 2 Image Libraries | reddit.com/r/DataHoarder | 2022-01-17
  • dduper

    Fast block-level out-of-band BTRFS deduplication tool.

    Project mention: Can I view the internal hash values for files? | reddit.com/r/btrfs | 2022-05-25

    ddupper uses a patched btrfs command to read file hases from the raw disk. It requires root access and is kind of a hack.

  • Deduper

    The goal of this project is to make a deduper program that anybody can run on their computer to save storage space. (by ThatOneShortGuy)

  • Scout APM

    Truly a developer’s best friend. Scout APM is great for developers who want to find and fix performance issues in their applications. With Scout, we'll take care of the bugs so you can focus on building great things 🚀.

NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020). The latest post mention was on 2022-12-04.

Python Dedupe related posts

Index

What are some of the best open-source Dedupe projects in Python? This list will help you:

Project Stars
1 BorgBackup 8,838
2 dedupe 3,560
3 imgdupes 246
4 dduper 131
5 Deduper 0
Build time-series-based applications quickly and at scale.
InfluxDB is the Time Series Data Platform where developers build real-time applications for analytics, IoT and cloud-native services in less time with less code.
www.influxdata.com