mindcastle.io
s3fs-fuse
mindcastle.io | s3fs-fuse | |
---|---|---|
2 | 57 | |
21 | 8,122 | |
- | 1.4% | |
10.0 | 8.8 | |
over 1 year ago | 11 days ago | |
C | C++ | |
GNU General Public License v3.0 only | GNU General Public License v3.0 only |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
mindcastle.io
-
Prolly Trees
I don’t know who came first, but https://github.com/jacobgorm/mindcastle.io also uses the rsync/LBFS rolling hashes trick to split the tree data into chunks. I presented the idea at Usenix Vault 2019 https://m.youtube.com/watch?v=QgOkDiP0C4c&embeds_referring_e...
-
Gcsfuse: A user-space file system for interacting with Google Cloud Storage
It is not how you would want do it for a typical ML workload, where the samples have to get randomly permuted each epoch.
Instead, tar up the files in some random order, and put the tar file on a web server or bucket, then stream then in during the first epoch, while keeping track of their byte offsets in the tar file, which you cache locally, assuming ample local Flash storage. Then permute the list of offsets and use those when reading samples for the next epoch.
If you only have local HDD then you will need a more advanced data structure like the one provided by https://github.com/jacobgorm/mindcastle.io , which will allow you to write out permuted samples at close to disk sequential write bandwidth. See my talk at USENIX Vault 2019 for a full explanation, linked from https://vertigo.ai/mindcastle/
s3fs-fuse
-
Is Posix Outdated?
The author needs to ask themselves: in this cloud technology stack, is there POSIX involved somewhere lower down, where I can't access it? The answer is, of course, "yes". The sort of cloud storage systems described all run on top of POSIX APIs. They provide convenience (cost efficiency is more debatable) compared to the POSIX alternative, but that's because they exist at an entirely different conceptual layer (hence the presence of POSIX anyway, just buried).
Your point about surfacing a POSIX that's actually there but hidden and thus visible to low-level Amazon employees building the S3 service which makes it invisible to S3 end customers is true but isn't the the point of the article. The author is saying there are motivations for a POSIX-like api visible also the end user.
So your explanation of stack looks like 2 layers: POSIX api <-- AWS S3 built on top of that
Author's essay is actually talking about 3 layers: POSIX <-- AWS S3 <-- POSIX
That's why the blog post has the following links to POSIX-on-top-of-S3-objects :
https://github.com/s3fs-fuse/s3fs-fuse
https://github.com/kahing/goofys
https://www.cuno.io/
- Gcsfuse: A user-space file system for interacting with Google Cloud Storage
-
R2 slow PUT file transfer
sudo apt install build-essential libfuse-dev fuse git clone https://github.com/s3fs-fuse/s3fs-fuse.git cd s3fs-fuse sudo apt install libfuse2 sudo apt install libcurl4-openssl-dev sudo apt install libxml2-dev ./autogen.sh ./configure make
- Cloud Backed SQLite
-
Podman and S3 Storage Driver (Audiobookshelf)
Don’t know actually. Here is project page.
- Uploading hundreds to thousands of files to S3
- Linux Client for R2
-
s3fs-fuse - allows to mount your s3/minio bucket link to your local directory
s3fs-fuse
-
AWS Announces Open Source Mountpoint for Amazon S3
How is this different than these other solutions?
https://github.com/kahing/goofys
https://github.com/s3fs-fuse/s3fs-fuse
-
Introducing Mountpoint for Amazon S3 - A file client that translates local file system API calls to S3 object API calls like GET and LIST.
I don’t get it. Why not just improve https://github.com/s3fs-fuse/s3fs-fuse
What are some alternatives?
seafowl - Analytical database for data-driven Web applications 🪶
goofys - a high-performance, POSIX-ish Amazon S3 file system written in Go
thumbhash - A very compact representation of an image placeholder
rclone - "rsync for cloud storage" - Google Drive, S3, Dropbox, Backblaze B2, One Drive, Swift, Hubic, Wasabi, Google Cloud Storage, Azure Blob, Azure Files, Yandex Files
extfuse - Extension Framework for FUSE
mountpoint-s3 - A simple, high-throughput file client for mounting an Amazon S3 bucket as a local file system.
azure-storage-fuse-aur - AUR package for Azure Storage Blobfuse
jellyfin-webos - WebOS Client for Jellyfin
azurefs - Mount Microsoft Azure Blob Storage as local filesystem in Linux (inactive)
jellyfin-tizen - Jellyfin Samsung TV Client
gcsfuse - A user-space file system for interacting with Google Cloud Storage
mediacms - MediaCMS is a modern, fully featured open source video and media CMS, written in Python/Django and React, featuring a REST API.