Deduplication

Open-source projects categorized as Deduplication

Top 23 Deduplication Open-Source Projects

  • restic

    Fast, secure, efficient backup program

    Project mention: Ask HN: What is your approach for managing personal digital assets? | news.ycombinator.com | 2024-03-24

    I religiously use Google contacts. It's the simplest way to keep people contacts up to date on Android.

    I archive all important documents in specific folders by subject and date. This is backed up to back blaze with restic. https://restic.net/

    I use https://ente.io for pictures. I convinced my wife to use it, and she agreed to auto share her photos so I don't nag her for copies. It had simple import from Facebook and Google.

    I also keep extensive journals, which really helps to tie it all together. I can basically grep for hangouts, conversations, etc.

    I also separate work journal from personal, and have essentially a journal for each project. https://jodavaho.io/tags/bullet-journal.html for how.

    I religiously use Google calendar for all plans, you can easily search it for past events to get dates.

    I also use monicahq for some notes about things I should remember about people but the habit never stuck.

  • BorgBackup

    Deduplicating archiver with compression and authenticated encryption.

    Project mention: I Backup | news.ycombinator.com | 2024-02-27
  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

  • alertmanager

    Prometheus Alertmanager

  • kopia

    Cross-platform backup tool for Windows, macOS & Linux with fast, incremental backups, client-side end-to-end encryption, compression and data deduplication. CLI and GUI included.

    Project mention: I Backup | news.ycombinator.com | 2024-02-27

    I've been happy with: https://kopia.io/

    Fairly easy to configure, does snapshots to S3 and has a icon in my tray I can watch :)

  • dupeguru

    Find duplicate files

    Project mention: How to use onedrive for culling photos | /r/onedrive | 2023-12-11

    Dupeguru

  • libpostal

    A C library for parsing/normalizing street addresses around the world. Powered by statistical NLP and open geo data.

    Project mention: Install Python Libraries Using Command Prompt | /r/Python | 2023-04-01

    @echo off REM Check if MSYS2 and MinGW are installed where msys2 2>nul >nul if %errorlevel% equ 0 ( echo MSYS2 is already installed. Use --force to reinstall. ) else ( REM Install MSYS2 and MinGW choco install msys2 refreshenv ) REM Check if MSYS2 packages are updated pacman -Qu 2>nul >nul if %errorlevel% equ 0 ( echo MSYS2 packages are already updated. Use --force to reinstall. ) else ( REM Update MSYS2 packages pacman -Syu ) REM Check if build dependencies are installed pacman -Q autoconf automake curl git make libtool gcc mingw-w64-x86_64-gcc 2>nul >nul if %errorlevel% equ 0 ( echo Build dependencies are already installed. Use --force to reinstall. ) else ( REM Install build dependencies pacman -S autoconf automake curl git make libtool gcc mingw-w64-x86_64-gcc ) REM Check if libpostal is cloned if exist libpostal ( echo libpostal repository is already cloned. Use --force to reinstall. ) else ( REM Clone libpostal repository git clone https://github.com/openvenues/libpostal ) cd libpostal REM Check if libpostal is built and installed if exist C:/Program Files/libpostal/bin/libpostal.dll ( echo libpostal is already built and installed. Use --force to reinstall. ) else ( REM Build and install libpostal cp -rf windows/* ./ ./bootstrap.sh ./configure --datadir=C:/libpostal make -j4 make install ) REM Check if libpostal is added to PATH environment variable setx /m PATH "%PATH%;C:\Program Files\libpostal\bin" 2>nul >nul if %errorlevel% equ 0 ( echo libpostal is already added to PATH environment variable. Use --force to reinstall. ) else ( REM Add libpostal to PATH environment variable setx PATH "%PATH%;C:\Program Files\libpostal\bin" ) REM Test libpostal installation libpostal "100 S Broad St, Philadelphia, PA" pause

  • rmlint

    Extremely fast tool to remove duplicates and other lint from your filesystem

    Project mention: fdupes: Identify or Delete Duplicate Files | news.ycombinator.com | 2023-11-02

    My preferred solution is rmlint [https://github.com/sahib/rmlint] mostly because it also looks at duplicate directories. It produces a bash script instead of deleting anything itself, so you can examine it before running the script it made.

  • WorkOS

    The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.

  • borgmatic

    Simple, configuration-driven backup software for servers and workstations

    Project mention: Rclone syncs your files to cloud storage | news.ycombinator.com | 2024-01-26

    - for important files, a separate box where I have borgmatic [1] in deduplication mode installed; this is updated once in a while

    Just curious: Do you have any reason to believe that such a data corruption bug is likely in ZFS? It seems like saying that ext4 could have a bug and you should also store stuff on NTFS, just in case (which I think does not make sense..).

    [1]: https://github.com/borgmatic-collective/borgmatic

  • rustic

    rustic - fast, encrypted, and deduplicated backups powered by Rust

    Project mention: Duplicity | news.ycombinator.com | 2024-01-24

    I'm a huge fan of restic as well. My only complaint is performance and memory usage. I'm looking forward to being able to use Rustic: https://rustic.cli.rs/

  • dwarfs

    A fast high compression read-only file system for Linux, Windows and macOS

    Project mention: Help! Does anyone know how to install johncena141 games on linux? | /r/LinuxCrackSupport | 2023-07-01

    on a fresh install all you need is dwarfs https://github.com/mhx/dwarfs and libopenal1

  • autorestic

    Config driven, easy backup cli for restic.

    Project mention: Duplicity | news.ycombinator.com | 2024-01-24

    I really like restic, and am personally happy to use it via the command line. It's very fast and efficient! However, I do wish there was better tooling / wrappers around it. For example, Pika Backup is a popular UI for Borg of which no equivalent exists for Restic. I'd love to be able to set something simple up on my partner's Macbook.

    For my own purposes, I've been using a script I found on Github[0] for a while, but it only really supports Backblaze B2 AFAIK.[1]

    I've been meaning to try autorestic[2] and resticprofile[3] as they are potentially more flexible than the script I'm currently using, and prestic[4] looks intriguing for my partner's use, but seems to have very few users. And the fact that there are so many competing tools makes it difficult to land on one.

    [0] https://github.com/erikw/restic-automatic-backup-scheduler

    [1] https://github.com/erikw/restic-automatic-backup-scheduler/i...

    [2] https://github.com/cupcakearmy/autorestic

    [3] https://github.com/creativeprojects/resticprofile

    [4] https://github.com/ducalex/prestic

  • zingg

    Scalable identity resolution, entity resolution, data mastering and deduplication using ML

  • rdedup

    Data deduplication engine, supporting optional compression and public key encryption.

    Project mention: Announcing rustic - fast, encrypted, deduplicated backups powered by Rust | /r/rust | 2023-04-24

    I'm not really doing much about it anymore, but I have somewhat similar project: https://github.com/dpc/rdedup

  • LSH

    Locality Sensitive Hashing using MinHash in Python/Cython to detect near duplicate text documents

  • deduplicator

    Filter, Sort & Delete Duplicate Files Recursively

  • cargo-limit

    Productivity improvements for Rust ecosystem: warnings are skipped until errors are fixed, LSP-independent Neovim integration, etc.

  • kvdo

    A kernel module which provide a pool of deduplicated and/or compressed block storage.

  • zpaqfranz

    Deduplicating archiver with encryption and paranoid-level tests. Swiss army knife for the serious backup and disaster recovery manager. Ransomware neutralizer. Win/Linux/Unix

    Project mention: How to ensure file integrity? | /r/DataHoarder | 2023-04-24

    Now, onto files backup - if you value your data, don't make just one backup copy, make two or three. Also, I'd recommend using software that will make snapshots and you could restore whichever version you need. I am using zpaqfranz for few years now, it is command line software but you can make batch file and update the archive when needed - it will add only new and changed files, so only first backup will last long.

  • vdo

    Userspace tools for managing VDO volumes.

  • dduper

    Fast block-level out-of-band BTRFS deduplication tool.

  • entity-embed

    PyTorch library for transforming entities like companies, products, etc. into vectors to support scalable Record Linkage / Entity Resolution using Approximate Nearest Neighbors.

  • benji

    Benji Backup: A block based deduplicating backup software for Ceph RBD images, iSCSI targets, image files and block devices

  • SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020). The latest post mention was on 2024-03-24.

Deduplication related posts

Index

What are some of the best open-source Deduplication projects? This list will help you:

Project Stars
1 restic 23,429
2 BorgBackup 10,422
3 alertmanager 6,233
4 kopia 6,079
5 dupeguru 4,692
6 libpostal 3,935
7 rmlint 1,757
8 borgmatic 1,619
9 rustic 1,442
10 dwarfs 1,244
11 splink 1,060
12 autorestic 1,055
13 zingg 868
14 rdedup 818
15 LSH 271
16 deduplicator 253
17 cargo-limit 237
18 kvdo 236
19 zpaqfranz 213
20 vdo 186
21 dduper 162
22 entity-embed 138
23 benji 136
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com