Hyperspace

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com

CodeRabbit: AI Code Reviews for Developers
Revolutionize your code reviews with AI. CodeRabbit offers PR summaries, code walkthroughs, 1-click suggestions, and AST-based analysis. Boost productivity and code quality across all major languages with each PR.
coderabbit.ai
featured
InfluxDB high-performance time series database
Collect, organize, and act on massive volumes of high-resolution data to power real-time intelligent systems.
influxdata.com
featured
  1. fclones

    Efficient Duplicate File Finder

    I've been using `fclones` [1] to do this, with `dedupe`, which uses reflink/clonefile.

    https://github.com/pkolaczk/fclones

  2. CodeRabbit

    CodeRabbit: AI Code Reviews for Developers. Revolutionize your code reviews with AI. CodeRabbit offers PR summaries, code walkthroughs, 1-click suggestions, and AST-based analysis. Boost productivity and code quality across all major languages with each PR.

    CodeRabbit logo
  3. rmlint

    Extremely fast tool to remove duplicates and other lint from your filesystem

    See the comments on https://news.ycombinator.com/item?id=38113396 for a list of alternatives. I used https://github.com/sahib/rmlint in the past and can't complain.

  4. dedup

    dedup finds and clones duplicate files (by ttkb-oss)

    Thank you for creating and sharing this utility.

    I ran it over my Postgres development directories that have almost identical files. It saved me about 1.7GB.

    The project doesn't have any license associated with it. If you don't mind, can you please license this project with a license of your choice.

    As a gesture of thanks, I have attempted to improve the installation step slightly and have created this pull request: https://github.com/ttkb-oss/dedup/pull/6

  5. system-tools

    I wrote a similar (but simpler) script which would replace a file by a hardlink if it has the same content.

    My main motivation was for the packages of Python virtual envs, where I often have similar packages installed, and even if versions are different, many files would still match.

    https://github.com/albertz/system-tools/blob/master/bin/merg...

  6. duperemove

    Tools for deduping file systems

    Yes, Linux has a systemcall to do this for any filesystem with reflink support (and it is safe and atomic). You need a "driver" program to identify duplicates but there are a handful out there. I've used https://github.com/markfasheh/duperemove and was very pleased with how it worked.

  7. pnpm

    Fast, disk space efficient package manager

    I think this is somewhat funny.

    His comment is pretty understandable if you've done frontend work in javascript.

    Node_modules is so ripe for duplicate content that some tools explicitly call out that they're disk efficient (It's literally in the tagline for PNPM "Fast, disk space efficient package manager": https://github.com/pnpm/pnpm)

    So he got ok results (~13% savings) on possibly the best target content available in a user's home directory.

    Then he got results so bad it's utterly not worth doing on the rest (0.10% - not 10%, literally 1/10 of a single percent).

    ---

    Deduplication isn't super simple, isn't always obviously better, and can require other system resources in unexpected ways (ex - lots of CPU and RAM). It's a cool tech to fiddle with on a NAS, and I'm generally a fan of modern CoW filesystems (incl APFS).

    But I want to be really clear - this is people picking spare change out of the couch style savings. Penny wise, pound foolish. The only people who are likely to actually save anything buying this app probably already know it, and have a large set of real options available. Everyone else is falling into the "download more ram" trap.

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts

  • I decluttered 14,000 digital items within a few hours. Here's how I did it.

    1 project | /r/declutter | 19 Feb 2023
  • Looking for Powerful Deduplication software

    4 projects | /r/DataHoarder | 23 Jan 2023
  • the very best anti-duplicate app ?

    1 project | /r/macapps | 22 Jan 2023
  • deleting duplicates programs?

    1 project | /r/commandline | 10 Jun 2022
  • script to remove redundant parent directories

    1 project | /r/bash | 30 Mar 2022

Did you know that C is
the 6th most popular programming language
based on number of references?