Is there a kind of "smart" compression that recognizes when multiple files in an archive are similar/same and stores it once for all of the similar/same files?

Our great sponsors

WorkOS - The modern identity platform for B2B SaaS

InfluxDB - Power Real-Time Data Analytics at Scale

SaaSHub - Software Alternatives and Reviews

Our great sponsors

czkawka

361 17,336 7.9 Rust

Multi functional app to find duplicates, empty folders, similar images etc.
BorgBackup

333 10,506 9.5 Python

Deduplicating archiver with compression and authenticated encryption.

What you're looking for is deduplication. There're different kinds. The simplest is file-level deduplication, where identical files are stored only once. The nest simplest is block-level deduplication, where each file is split into fixed-sized chunks, and each unique chunk is stored only once. The most effective, though, is rolling deduplication, where a rolling checksum of the data is kept, and used to deduplicate blocks that are the same, but not at the same offset within a file. An example of a (backup) tool that can do this is Borg Backup.

WorkOS

workos.com sponsored

The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
kopia

224 6,241 9.6 Go

Cross-platform backup tool for Windows, macOS & Linux with fast, incremental backups, client-side end-to-end encryption, compression and data deduplication. CLI and GUI included.

yes (WSL/cygwin) but if you want to go that route take also a look at https://kopia.io/

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project