-
Seaweed File System
Discontinued SeaweedFS is a fast distributed storage system for blobs, objects, files, and data lake, for billions of files! Blob store has O(1) disk seek, cloud tiering. Filer supports Cloud Drive, cross-DC active-active replication, Kubernetes, POSIX FUSE mount, S3 API, S3 Gateway, Hadoop, WebDAV, encryption, Erasure Coding. [Moved to: https://github.com/seaweedfs/seaweedfs] (by chrislusf)
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
For computers, batch IO operations are much faster than random IO and can easily saturate the network.
This benchmark uses large batch size, 64MB, to test. There is nothing new here. Most common file systems can easily do the same.
The difficult task is to read and write lots of small files. There is a term for it, LOSF. I work on SeaweedFS, https://github.com/chrislusf/seaweedfs , which is designed to handle LOSF. And of course, no problem with large files at all.
my experience, i dont know if this is comparable, but from my memory (i have not made any notes on that), i've tried min.io in december and switched to seaweed a weeks ago, because my usecase was transition from local file storage to DFS + also enable our developers to transition from local filesystem to s3. Since my resources are limited (vsphere VM) with 3 hosts + different disks, i tried to set up a 3 vm cluster with minio first, after i did some research on different systems (ceph, longhorn.io, ..) i wanted to have an easy setup-able system, which supports s3. I relied a lot on what people measured and chose min.io first because it supported mount via s3. Then i tried to copy over about 34 million files (mostly few bytes, but can also be 1Gbyte), with a mass of about 4.2TB. I tried different methods, rsync, cp, cp with parallelism,.. and i took me about 3 days to copy over 300GB of data at best. Then i also found out that it was impossible to list files. We have one single folder with over 300k projects (guid) beneath (growing). After that i gave seaweed a shot. Why i did not used it firsthand was documentation was a bit confusing and it did not gave me all the answers i needed as fast as minio did.
Now, my seaweed setup is a 3 vm cluster with 3 disks per vm (1TB) each. I configured a wireguard mesh (https://github.com/k4yt3x/wg-meshconf) between the VMs and configured master and volumes server to talk to each other via wireguard IPs securely. I also configured ufw to only allow communication between http/gRPC ports. I also configured a filer (using leveldb3) to use wireguard IPs (master and volumes) and let it communicate with some specific servers on the outside (ufw).
After that i mounted the filer via weed.mount on that specific server and tried to copy over the same files/folders. after 2 days i copied over about 1.5 TB of the data via rsync. There was also no problem with file listing and accessing the filer from different machines while uploading stuff. But there is a overhead when reading and creating lots of small files. File listing is even faster than local btrfs file listing.
chris is also very nice and fast fixing bugs.
Related posts
-
[Torrents] La transmission et le déluge sont bien, mais Wow est leur développement mort. Des centaines de problèmes intacts et de demandes de traction. De nouveaux développeurs veulent prendre le relais et fourche?
-
Plugins can't be updated
-
judge me based on my desktop
-
meirl
-
Tool to parse, index, and search local documents? - Windows