Git ls-files is Faster Than Fd and Find

InfluxDB - Power Real-Time Data Analytics at Scale

Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

www.influxdata.com

featured

SaaSHub - Software Alternatives and Reviews

SaaSHub helps you find the best software and product alternatives

www.saashub.com

featured

cligen

32 489 8.4 Nim

Nim library to infer/generate command-line-interfaces / option / argument parsing; Docs at

I got run times from the simplest single-threaded directory walk only 1.8x slower than git ls-files. The code is in https://github.com/c-blake/cligen/blob/master/cligen/dents.n... (just `dents find` does not require the special kernel batch system call module to be fast.)
I believe that GNU find is slow because it is specifically written to allow arbitrary filesystem depth as opposed to "open file descriptor limit-limited depth".
Meanwhile, I think the Rust fd is slow because of (probably counterproductive) multi-threading (at least it does 11,000 calls to futex).

walk

4 67 0.0 C

Plan 9 style utilities to replace find(1) (by google)
InfluxDB

www.influxdata.com featured

Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
walkdir

5 1,179 4.3 Rust

Rust library for walking directories recursively.

> I believe that GNU find is slow because it is specifically written to allow arbitrary filesystem depth as opposed to "open file descriptor limit-limited depth".
I haven't benchmarked find specifically, but I believe the most common Rust library for the purpose, walkdir[1], also allows arbitrary file system recursion depth, and is extremely fast. It was fairly close to some "naive" limited depth code I wrote in C for the same purpose.
I'd be curious to see benchmarks of whether this actually makes a difference.
[1] https://github.com/BurntSushi/walkdir

loggedfs

2 110 0.0 C++

LoggedFS - Filesystem monitoring with Fuse

I'm absolutely not an expert, but I feel like log-structured filesystems (https://en.wikipedia.org/wiki/Log-structured_file_system) are a natural fit for this kind of things: an index "just" has to read the latest written entries.
But if we're talking about the future, we're probably talking about btrfs and zfs, both of which have the internal machinery to give you a feed of "recently changed files" up to the beginning of the filesystem.
While writing this answer I stumbled upon https://github.com/rflament/loggedfs which is probably a very nice solution to this problem.

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

What Are You Building? Share Your Projects

7 projects | news.ycombinator.com | 2 May 2024
Show HN: Jsonpak: JSON that is not bloated for Nim

1 project | news.ycombinator.com | 24 Apr 2024
Chinchilla Scaling: A Replication Attempt

1 project | news.ycombinator.com | 19 Apr 2024
Nim Versions 2.0.4 and 1.6.20 released

1 project | news.ycombinator.com | 16 Apr 2024
Nitter over IPFS or Torrent

1 project | news.ycombinator.com | 6 Apr 2024

Git ls-files is Faster Than Fd and Find

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com Post date: 21 Nov 2021

cligen

walk

InfluxDB

walkdir

loggedfs

Related posts

What Are You Building? Share Your Projects

Show HN: Jsonpak: JSON that is not bloated for Nim

Chinchilla Scaling: A Replication Attempt

Nim Versions 2.0.4 and 1.6.20 released

Nitter over IPFS or Torrent