napkin-math
mountpoint-s3
napkin-math | mountpoint-s3 | |
---|---|---|
13 | 17 | |
3,093 | 4,091 | |
- | 3.5% | |
6.3 | 9.5 | |
12 days ago | 2 days ago | |
Rust | Rust | |
MIT License | Apache License 2.0 |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
napkin-math
- capacity planning in system design interviews
- Napkin Math
-
S3 Express Is All You Need
Most production storage systems/databases built on top of S3 spend a significant amount of effort building an SSD/memory caching tier to make them performant enough for production (e.g. on top of RocksDB). But it's not easy to keep it in sync with blob...
Even with the cache, the cold query latency lower-bound to S3 is subject to ~50ms roundtrips [0]. To build a performant system, you have to tightly control roundtrips. S3 Express changes that equation dramatically, as S3 Express approaches HDD random read speeds (single-digit ms), so we can build production systems that don't need an SSD cache—just the zero-copy, deserialized in-memory cache.
Many systems will probably continue to have an SSD cache (~100 us random reads), but now MVPs can be built without it, and cold query latency goes down dramatically. That's a big deal
We're currently building a vector database on top of object storage, so this is extremely timely for us... I hope GCS ships this ASAP. [1]
[0]: https://github.com/sirupsen/napkin-math
-
Random Read or Sequential Read
Trying to estimate performance using some napkin math based on this: https://github.com/sirupsen/napkin-math
-
A CVE has been issued for hyper. Denial of Service possible
So napkin maths time. Typical cross-world bog-standard network speeds for a single TCP channel of ~25MiBps. A single HEADERS+RST pair is likely < 128 bytes (40 for the HEADERS + whatever payload, and 32 for the RST). So 8 pairs per K, 8K pairs per MiB, 200K pairs per 25MiB...
- Index Merges vs Composite Indexes in Postgres and MySQL
-
I/O is no longer the bottleneck
Yes, sequential I/O bandwidth is closing the gap to memory. [1] The I/O pattern to watch out for, and the biggest reason why e.g. databases do careful caching to memory, is that _random_ I/O is still dreadfully slow. I/O bandwidth is brilliant, but latency is still disappointing compared to memory.
[1]: https://github.com/sirupsen/napkin-math
- Monthly cost to host server for 1M DAUs?
- Napkin-math: Techniques and numbers for estimating system's performance
-
System Design prep?
https://github.com/sirupsen/napkin-math (memorize these)
mountpoint-s3
-
Row Zero and Viewport Data Streaming
... or does "S3 file system" mean https://github.com/awslabs/mountpoint-s3 - a Rust project by AWS Labs that provides "a simple, high-throughput file client for mounting an Amazon S3 bucket as a local file system" ?
- s3m: A CLI for streams of data in S3 buckets
-
S3 Express Is All You Need
Looks like support for S3 Express was merged in with version 1.30 just a few hours ago https://github.com/awslabs/mountpoint-s3/pull/642
-
Gcsfuse: A user-space file system for interacting with Google Cloud Storage
mountpoint-s3 is AWS’ first party solution for mounting s3 buckets as file systems: https://github.com/awslabs/mountpoint-s3
Haven’t used it but it looks cool, if a bit immature.
- Mountpoint for S3
- When would something like this come to ADLS Gen 2?
-
Running Amazon S3 Mountpoint Inside a Container
FROM rust:1.68.0 as Build RUN apt-get update && apt-get install -y \ clang\ cmake \ curl \ fuse \ git \ libfuse-dev \ pkg-config \ && apt-get clean \ && rm -rf /var/lib/apt/lists/* \ && git clone --recurse-submodules https://github.com/awslabs/mountpoint-s3.git \ && cd mountpoint-s3 \ && cargo build --release FROM debian:bullseye-slim RUN apt-get update && apt-get install -y \ ca-certificates \ libfuse-dev \ sudo \ && apt-get clean \ && rm -rf /var/lib/apt/lists/* COPY --from=build /mountpoint-s3/target/release/mount-s3 /usr/local/bin/mount-s3 RUN chmod 777 /usr/local/bin/mount-s3 RUN useradd -ms /bin/bash mount-s3-user \ && echo '%sudo ALL=(ALL) NOPASSWD:ALL' >> /etc/sudoers \ && adduser mount-s3-user sudo USER mount-s3-user
- GitHub - awslabs/mountpoint-s3: A simple, high-throughput file client for mounting an Amazon S3 bucket as a local file system.
-
The inside story on Mountpoint for Amazon S3, a high-performance open source file client
This might be useful with a MinIO server, although not directly supported
What are some alternatives?
huniq - Filter out duplicates on the command line. Replacement for `sort | uniq` optimized for speed (10x faster) when sorting is not needed.
s3fs-fuse - FUSE-based file system backed by Amazon S3
advisory-database - Security vulnerability database inclusive of CVEs and GitHub originated security advisories from the world of open source software.
PosixSyncFS - PosixSyncFS is a set of Bash scripts that allow users to create a real POSIX filesystem and sync it to a remote storage bucket for backup and recovery purposes.
adix - An Adaptive Index Library for Nim
goofys - a high-performance, POSIX-ish Amazon S3 file system written in Go
h2 - HTTP 2.0 client & server implementation for Rust.
aws-eks-iam-auth-controller - Kubernetes operator which consolidates custom resources into `aws-auth` ConfigMap.
RAMCloud - **No Longer Maintained** Official RAMCloud repo
usbd - User-Space Block Device (USBD) Framework (written in Go)
simdjson - Parsing gigabytes of JSON per second : used by Facebook/Meta Velox, the Node.js runtime, ClickHouse, WatermelonDB, Apache Doris, Milvus, StarRocks
rclone - "rsync for cloud storage" - Google Drive, S3, Dropbox, Backblaze B2, One Drive, Swift, Hubic, Wasabi, Google Cloud Storage, Azure Blob, Azure Files, Yandex Files