mindcastle.io
mountpoint-s3
mindcastle.io | mountpoint-s3 | |
---|---|---|
2 | 17 | |
21 | 4,080 | |
- | 3.3% | |
10.0 | 9.5 | |
over 1 year ago | 5 days ago | |
C | Rust | |
GNU General Public License v3.0 only | Apache License 2.0 |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
mindcastle.io
-
Prolly Trees
I don’t know who came first, but https://github.com/jacobgorm/mindcastle.io also uses the rsync/LBFS rolling hashes trick to split the tree data into chunks. I presented the idea at Usenix Vault 2019 https://m.youtube.com/watch?v=QgOkDiP0C4c&embeds_referring_e...
-
Gcsfuse: A user-space file system for interacting with Google Cloud Storage
It is not how you would want do it for a typical ML workload, where the samples have to get randomly permuted each epoch.
Instead, tar up the files in some random order, and put the tar file on a web server or bucket, then stream then in during the first epoch, while keeping track of their byte offsets in the tar file, which you cache locally, assuming ample local Flash storage. Then permute the list of offsets and use those when reading samples for the next epoch.
If you only have local HDD then you will need a more advanced data structure like the one provided by https://github.com/jacobgorm/mindcastle.io , which will allow you to write out permuted samples at close to disk sequential write bandwidth. See my talk at USENIX Vault 2019 for a full explanation, linked from https://vertigo.ai/mindcastle/
mountpoint-s3
-
Row Zero and Viewport Data Streaming
... or does "S3 file system" mean https://github.com/awslabs/mountpoint-s3 - a Rust project by AWS Labs that provides "a simple, high-throughput file client for mounting an Amazon S3 bucket as a local file system" ?
- s3m: A CLI for streams of data in S3 buckets
-
S3 Express Is All You Need
Looks like support for S3 Express was merged in with version 1.30 just a few hours ago https://github.com/awslabs/mountpoint-s3/pull/642
-
Gcsfuse: A user-space file system for interacting with Google Cloud Storage
mountpoint-s3 is AWS’ first party solution for mounting s3 buckets as file systems: https://github.com/awslabs/mountpoint-s3
Haven’t used it but it looks cool, if a bit immature.
- Mountpoint for S3
- When would something like this come to ADLS Gen 2?
-
Running Amazon S3 Mountpoint Inside a Container
FROM rust:1.68.0 as Build RUN apt-get update && apt-get install -y \ clang\ cmake \ curl \ fuse \ git \ libfuse-dev \ pkg-config \ && apt-get clean \ && rm -rf /var/lib/apt/lists/* \ && git clone --recurse-submodules https://github.com/awslabs/mountpoint-s3.git \ && cd mountpoint-s3 \ && cargo build --release FROM debian:bullseye-slim RUN apt-get update && apt-get install -y \ ca-certificates \ libfuse-dev \ sudo \ && apt-get clean \ && rm -rf /var/lib/apt/lists/* COPY --from=build /mountpoint-s3/target/release/mount-s3 /usr/local/bin/mount-s3 RUN chmod 777 /usr/local/bin/mount-s3 RUN useradd -ms /bin/bash mount-s3-user \ && echo '%sudo ALL=(ALL) NOPASSWD:ALL' >> /etc/sudoers \ && adduser mount-s3-user sudo USER mount-s3-user
- GitHub - awslabs/mountpoint-s3: A simple, high-throughput file client for mounting an Amazon S3 bucket as a local file system.
-
The inside story on Mountpoint for Amazon S3, a high-performance open source file client
This might be useful with a MinIO server, although not directly supported
What are some alternatives?
seafowl - Analytical database for data-driven Web applications 🪶
s3fs-fuse - FUSE-based file system backed by Amazon S3
thumbhash - A very compact representation of an image placeholder
PosixSyncFS - PosixSyncFS is a set of Bash scripts that allow users to create a real POSIX filesystem and sync it to a remote storage bucket for backup and recovery purposes.
extfuse - Extension Framework for FUSE
goofys - a high-performance, POSIX-ish Amazon S3 file system written in Go
azure-storage-fuse-aur - AUR package for Azure Storage Blobfuse
aws-eks-iam-auth-controller - Kubernetes operator which consolidates custom resources into `aws-auth` ConfigMap.
azurefs - Mount Microsoft Azure Blob Storage as local filesystem in Linux (inactive)
usbd - User-Space Block Device (USBD) Framework (written in Go)
gcsfuse - A user-space file system for interacting with Google Cloud Storage
rclone - "rsync for cloud storage" - Google Drive, S3, Dropbox, Backblaze B2, One Drive, Swift, Hubic, Wasabi, Google Cloud Storage, Azure Blob, Azure Files, Yandex Files