Git ls-files is Faster Than Fd and Find

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com

InfluxDB - Power Real-Time Data Analytics at Scale
Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
www.influxdata.com
featured
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com
featured
  • cligen

    Nim library to infer/generate command-line-interfaces / option / argument parsing; Docs at

  • I got run times from the simplest single-threaded directory walk only 1.8x slower than git ls-files. The code is in https://github.com/c-blake/cligen/blob/master/cligen/dents.n... (just `dents find` does not require the special kernel batch system call module to be fast.)

    I believe that GNU find is slow because it is specifically written to allow arbitrary filesystem depth as opposed to "open file descriptor limit-limited depth".

    Meanwhile, I think the Rust fd is slow because of (probably counterproductive) multi-threading (at least it does 11,000 calls to futex).

  • walk

    Plan 9 style utilities to replace find(1) (by google)

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • walkdir

    Rust library for walking directories recursively.

  • > I believe that GNU find is slow because it is specifically written to allow arbitrary filesystem depth as opposed to "open file descriptor limit-limited depth".

    I haven't benchmarked find specifically, but I believe the most common Rust library for the purpose, walkdir[1], also allows arbitrary file system recursion depth, and is extremely fast. It was fairly close to some "naive" limited depth code I wrote in C for the same purpose.

    I'd be curious to see benchmarks of whether this actually makes a difference.

    [1] https://github.com/BurntSushi/walkdir

  • loggedfs

    LoggedFS - Filesystem monitoring with Fuse

  • I'm absolutely not an expert, but I feel like log-structured filesystems (https://en.wikipedia.org/wiki/Log-structured_file_system) are a natural fit for this kind of things: an index "just" has to read the latest written entries.

    But if we're talking about the future, we're probably talking about btrfs and zfs, both of which have the internal machinery to give you a feed of "recently changed files" up to the beginning of the filesystem.

    While writing this answer I stumbled upon https://github.com/rflament/loggedfs which is probably a very nice solution to this problem.

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts