Gcsfuse: A user-space file system for interacting with Google Cloud Storage

Our great sponsors

WorkOS - The modern identity platform for B2B SaaS

InfluxDB - Power Real-Time Data Analytics at Scale

SaaSHub - Software Alternatives and Reviews

Our great sponsors

gcsfuse

31 1,977 9.7 Go

A user-space file system for interacting with Google Cloud Storage

It uses FUSE and there's three types of Kernel cache you could use with FUSE (although, it seems like gcsfuse is exposing only one):
1. Cache of file attributes in the Kernel (this is controlled by "stat-cache-ttl" value - https://github.com/GoogleCloudPlatform/gcsfuse/blob/7dc5c7ff...)

juicefs

42 9,791 9.7 Go

JuiceFS is a distributed POSIX file system built on top of Redis and S3.

If you really expect a file system experience over GCS, please try JuiceFS [1], which scales to 10 billions of files pretty well with TiKV or FoundationDB as meta engine.
PS, I'm founder of JuiceFS.
[1] https://github.com/juicedata/juicefs

WorkOS

workos.com sponsored

The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
catfs

4 809 3.6 Rust

Cache AnyThing filesystem written in Rust

I don't think it is, instead each operation makes a request. You can use something like catfs https://github.com/kahing/catfs

azure-storage-fuse-aur

2 1 6.2 Shell

AUR package for Azure Storage Blobfuse

https://github.com/Azure/azure-storage-fuse
It has some nice features like streaming with block level caching for fast readonly access

rclone

963 43,720 9.8 Go

"rsync for cloud storage" - Google Drive, S3, Dropbox, Backblaze B2, One Drive, Swift, Hubic, Wasabi, Google Cloud Storage, Yandex Files

Why not rclone? It was discussed here yesterday as a replacement for sshfs - and supports GCS as well as dozens more backends.
https://rclone.org/

mountpoint-s3

17 4,003 9.5 Rust

A simple, high-throughput file client for mounting an Amazon S3 bucket as a local file system.

mountpoint-s3 is AWS’ first party solution for mounting s3 buckets as file systems: https://github.com/awslabs/mountpoint-s3
Haven’t used it but it looks cool, if a bit immature.

s3fs

7 813 8.0 Python

S3 Filesystem
InfluxDB

www.influxdata.com sponsored

Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
s3fs-fuse

57 8,079 8.8 C++

FUSE-based file system backed by Amazon S3
azurefs

1 70 10.0 Python

Mount Microsoft Azure Blob Storage as local filesystem in Linux (inactive)

Hah nice! I developed https://github.com/ahmetb/azurefs back in 2012 when I was about to join to Azure. I'm glad Azure actually provides a supported and actively-maintained tool for this.

extfuse

2 198 2.9 C

Extension Framework for FUSE

FUSE does not work well with a large number of small files (due to high metadata ops such as inode/dentry lookups).
ExtFUSE (optimized FUSE with eBPF) [1] can offer you a high performance. It caches metadata in the kernel to avoid lookups in user space.
1. https://github.com/extfuse/extfuse

mindcastle.io

2 21 10.0 C

Massively scalable, cloud-backed distributed block device for Linux and VMs

It is not how you would want do it for a typical ML workload, where the samples have to get randomly permuted each epoch.
Instead, tar up the files in some random order, and put the tar file on a web server or bucket, then stream then in during the first epoch, while keeping track of their byte offsets in the tar file, which you cache locally, assuming ample local Flash storage. Then permute the list of offsets and use those when reading samples for the next epoch.
If you only have local HDD then you will need a more advanced data structure like the one provided by https://github.com/jacobgorm/mindcastle.io , which will allow you to write out permuted samples at close to disk sequential write bandwidth. See my talk at USENIX Vault 2019 for a full explanation, linked from https://vertigo.ai/mindcastle/

seafowl

11 353 9.3 Rust

Analytical database for data-driven Web applications 🪶

In case you're interested in scale-to-zero database hosting, a few months ago I paired gcsfuse with Seafowl [0][1], an early stage open source database written in Rust. Was a lot of fun balancing tradeoffs that are usually not possible with classical databases e.g. Postgres. Thank you gcsfuse contributors.
[0] https://seafowl.io

thumbhash

9 3,230 2.5 Swift

A very compact representation of an image placeholder

You may wish to investigate cloudflare's image API: https://developers.cloudflare.com/images/cloudflare-images/
If the reason you were unable to use a CDN cache was because your access patterns require a lot of varying end serializations (due to things like image manipulation, resizing, cropping, watermarking, etc.), then this API could be a huge money saver for you. It was for me.
OTOH if the cost was because compute isn't free and the corresponding cloudflare worker compute cost is too much, then yeah, that's a tough one... I don't have a packaged answer for you, but I would investigate something like ThumbHash: https://evanw.github.io/thumbhash/ - my intuition is that you can probably serve some highly optimized/interlaced/"hashed" placeholder. The advantage of thumbhash here could be that you can optimize the access pattern to be less spendy by simply storing all of your hashes in an optimized way, since they will be extremely small, like small enough to be included in an index for index-only scans ("covering indexes").

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Mountpoint – file client for S3 written in Rust, from AWS
14 projects | news.ycombinator.com | 14 Mar 2023
Migrate OUT of S3 to a Windows File Server
2 projects | /r/aws | 23 Dec 2022
Merge my S3 with Mac Finder Folder
3 projects | /r/aws | 12 Nov 2022
WinFsp – Windows File System Proxy
9 projects | news.ycombinator.com | 22 Aug 2021
S3 from emacs
2 projects | /r/emacs | 24 Apr 2021

Gcsfuse: A user-space file system for interacting with Google Cloud Storage

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com
Fuse Filesystem S3 fuse-filesystem Golang
Post date: 6 Sep 2023

gcsfuse

juicefs

WorkOS

catfs

azure-storage-fuse-aur

rclone

mountpoint-s3

s3fs

InfluxDB

s3fs-fuse

azurefs

extfuse

mindcastle.io

seafowl

thumbhash

Related posts

Gcsfuse: A user-space file system for interacting with Google Cloud Storage

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com Fuse Filesystem S3 fuse-filesystem Golang Post date: 6 Sep 2023

Related posts

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com
Fuse Filesystem S3 fuse-filesystem Golang
Post date: 6 Sep 2023