Is there a kind of "smart" compression that recognizes when multiple files in an archive are similar/same and stores it once for all of the similar/same files?

This page summarizes the projects mentioned and recommended in the original post on /r/DataHoarder

Our great sponsors
  • WorkOS - The modern identity platform for B2B SaaS
  • InfluxDB - Power Real-Time Data Analytics at Scale
  • SaaSHub - Software Alternatives and Reviews
  • czkawka

    Multi functional app to find duplicates, empty folders, similar images etc.

  • BorgBackup

    Deduplicating archiver with compression and authenticated encryption.

  • What you're looking for is deduplication. There're different kinds. The simplest is file-level deduplication, where identical files are stored only once. The nest simplest is block-level deduplication, where each file is split into fixed-sized chunks, and each unique chunk is stored only once. The most effective, though, is rolling deduplication, where a rolling checksum of the data is kept, and used to deduplicate blocks that are the same, but not at the same offset within a file. An example of a (backup) tool that can do this is Borg Backup.

  • WorkOS

    The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.

    WorkOS logo
  • kopia

    Cross-platform backup tool for Windows, macOS & Linux with fast, incremental backups, client-side end-to-end encryption, compression and data deduplication. CLI and GUI included.

  • yes (WSL/cygwin) but if you want to go that route take also a look at https://kopia.io/

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts