Hop: 25x faster than unzip and 10x faster than tar at reading individual files

Our great sponsors

InfluxDB - Power Real-Time Data Analytics at Scale

WorkOS - The modern identity platform for B2B SaaS

SaaSHub - Software Alternatives and Reviews

Our great sponsors

hop

7 348 0.0 Zig
asar

6 2,459 6.9 JavaScript

Simple extensive tar-like archive format with indexing

Since Hop doesn't do compression, the most appropriate comparison would be to asar
https://github.com/electron/asar
It's not hard being faster than zip if you are not compressing/uncompressing.

InfluxDB

www.influxdata.com sponsored

Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
tarindexer

3 69 10.0 Python

python module for indexing tar files for fast access

There exists a utility called tarindexer [0] that can be used for random access to tar files. An index text file is created (one time) that is used to record the position of the files in the tar archive. Random reads are done by loading the index file and then seeking to the location of the file in question.
For random access to gzip'd files, bgzip [1] can be used. bgzip also uses an index file (one time creation) that is used to record key points for random access.
[0] https://github.com/devsnd/tarindexer
[1] http://www.htslib.org/doc/bgzip.html

peechy

1 18 4.4 C++

A fork of the Kiwi Message Format

Came to mention Bun when I saw this hit the front page. I’ve been following Jarred’s Twitter since I heard about Bun and it’s quite impressive (albeit incomplete). To folks wondering why another bundler/what makes Bun special:
- Faster than ESBuild/SWC
- Fast build-time macros written as JSX (likely friendlier to develop than say a Babel plugin/macro). These open up a lot of possibilities that could benefit end users too, by performing more work on build/server and less client side.
- Targeting ecosystem compatibility (eg will probably support the new JSX transform, which ESBuild does not and may not in the future)
- Support for integration with frameworks, eg Next.js
- Other cool performance-focused tools like Hop and Peechy[1] (though that’s a fork of ESBuild creator’s project Kiwi)
This focus on performance is good for the JS ecosystem and for the web generally.
1: https://github.com/Jarred-Sumner/peechy

pixz

8 684 4.8 C

Parallel, indexed xz compressor

Also relevant is pixz [1] which can do parallel LZMA/XZ decompression as well as tar file indexing.
[1] https://github.com/vasi/pixz

fd

172 31,581 8.8 Rust

A simple, fast and user-friendly alternative to 'find'
ouch

12 1,959 9.3 Rust

Painless compression and decompression in the terminal
WorkOS

workos.com sponsored

The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
ratarmount

10 628 8.6 Python

Access large archives as a filesystem efficiently, e.g., TAR, RAR, ZIP, GZ, BZ2, XZ, ZSTD archives

I've recently been looking into this same issue because I analyse a lot of data like sosreports or other tar/compressed data from customer systems. Currently I untar these onto my zfs filesystem which works out OK because it has zstd compression enabled but I end up decompressing and recompressing which is quite expensive as often the files are GBs or more compressed.
But I've started using a tool called "ratarmount" (https://github.com/mxmlnkn/ratarmount) which creates an index once (and something I could automate our upload system to generate in advance, but you can also just process it lcoally) and then lets you fuse mount the file. This works pretty great with the only exception that I can't create scratch files inside the directory layout which in the past I'd wanted to do.
I was surprised how hard a problem to solve it is to get a bundle file format that is indexable and compressed with a good and fast compression algorithm which mostly boils down to zstd at this point.
While it works quite well, especially with gzip and bzip2, sadly the zstd and xz (and some other compression formats) don't allow for decompressing only parts of a file by default, even though it's possible the default tools aren't doing it. The nitty gritty details are summarised here:

mozilla-central-old

1 32 10.0 C++

Unofficial import of Mozilla's mozilla-central hg repository using hg-git

Yes, it's in python :)
https://github.com/humphd/mozilla-central-old/blob/9d4d9f265...

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project