minio
Seaweed File System
DISCONTINUED
Our great sponsors
minio | Seaweed File System | |
---|---|---|
99 | 49 | |
43,629 | 14,960 | |
2.1% | - | |
9.9 | 9.9 | |
7 days ago | over 1 year ago | |
Go | Go | |
GNU Affero General Public License v3.0 | Apache License 2.0 |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
minio
-
A Distributed File System in Go Cut Average Metadata Memory Usage to 100 Bytes
Looks like minio added this in 2022:
-
Ask HN: I have 10 yrs of Exp. Failed 4 takehome projects. What am I doing wrong?
>Again, here you seem to be arguing against a strawman that doesn't know that blocking the IO loop is bad. Try arguing against one that knows ways to work around that. This is why I'm saying this rule isn't true. Extensive computation on single-threaded "scripting" languages is possible (and even if it wasn't, punt it off to a remote pool of workers, which could also be NodeJS!).
Very rare to find a rule that's absolutely true.. I clearly stated exceptions to the rule (which you repeated) but the generality is still true.
Threading in nodejs is new and didn't exist since the last time I touched it. It looks like it's not the standard use case as google searches still have websites with titles saying node is single threaded everywhere. The only way I can see this being done is multiple Processes (meaning each with a copy of v8) using OS shared memory as IPC and they're just calling it threads. It will take a shit load of work to make v8 actually multi-threaded.
Processes are expensive so you can't really follow this model per request. And we stopped following threading per request over a decade ago.
Again these are exceptions to the rule, from what I'm reading Nodejs is normally still single threaded with a fixed number of worker processes that are called "threads". Under this my general rule is still generally true: backend engineering does no typically involve writing non blocking code and offloading compute to other sources. Again, there are exceptions but as I stated before these exceptions are rare.
>Here's what I mean -- you can actually solve the ordering problem in O(N) + O(M) time by keeping track of the max you've seen and building a sparse array and running through every single index from max to zero. It's overkill, but it's generally referred to as a counting sort:
Oh come on. We both know these sorts won't work. These large numbers will throw off memory. Imagine 3 routes. One route gets 352 hits, another route gets 400 hits, and another route gets 600,000 hits. What's Big Oh for memory and sort?
It's O(600,000) for both memory and runtime. N=3 and it doesn't even matter here. Yeah these types of sorts are almost never used for this reason, they only work for things with smaller ranges. It's also especially not useful for this project. Like this project was designed so "counting sort" fails big time.
Also we don't need to talk about the O(N) read and write. That's a given it's always there.
>I don't think these statements make sense -- having docker installed and having redis installed are basically equivalent work. At the end of the day, the outcome is the same -- the developer is capable of running redis locally. Having redis installed on your local machine is absolutely within range for a backend developer.
Unfortunately these statements do make sense and your characterization seems completely dishonest to me. People like to keep their local environments pure and segregated away from daemons that run in a web server. I'm sure in your universe you are claiming web developers install redis, postgresql and kafka all locally but that just sounds absurd to me. We can agree to disagree but from my perspective I don't think you're being realistic here.
>Also, remote development is not practiced by many companies -- the only companies I've seen doing thin-clients that are large.
It's practiced by a large amount and basically every company I've worked at for the past 5 years. Every company has to at least partially do remote dev in order to fully test E2E stuff or integrations.
>I see it as just spinning up docker, not compose -- you already have access to the app (ex. if it was buildable via a function) so you could spawn redis in a subprocess (or container) on a random port, and then spawn the app.
Sure. The point is it's hacky to do this without an existing framework. I'll check out that library you linked.
>I agree that integration testing is harder -- I think there's more value there.
Of course there's more value. You get more value at higher cost. That's been my entire point.
>Also, for replicating S3, minio (https://github.com/minio/minio) is a good stand-in. For replicating lambda, localstack (https://docs.localstack.cloud/user-guide/aws/lambda/) is probably reasonable there's also frameworks with some consideration for this (https://www.serverless.com/framework/docs/providers/aws/guid...) built in.
Good finds. But what about SNS, IOT, Big Query and Redshift? Again my problem isn't about specific services, it's about infra in general.
>Ah, this is true -- but I think this is what people are testing in interviews. There is a predominant culture/shared values, and the test is literally whether someone can fit into those values.
No. I think what's going on is people aren't putting much thought into what they're actually interviewing for. They just have some made up bar in their mind whether it's a leetcode algorithm or whether the guy wrote a unit test for the one available pure function for testing.
>Whether they should or should not be, that's at least partially what interviews are -- does the new team member feel the same way about technical culture currently shared by the team.
The answer is no. There's always developers who disagree with things and just don't reveal it. Think about the places you worked at. Were you in total agreement? I doubt it. A huge amount of devs are opinionated and think company policies or practices are BS. People adapt.
>Now in the case of this interview your solution was just fine, even excellent (because you went out of your way to do async io, use newer/easier packaging methodologies, etc), but it's clearly not just that.
The testing is just a game. I can play the game and suddenly I pass all the interviews. I think this is the flaw with your methodology as I just need to write tests to get in. Google for example in spirit attempted another method which involves testing IQ via algorithms. It's a much higher bar
The problem with google is that their methodology can also be gamed but it's much harder to game it and often the bar is too high for the actual job the engineer is expected to do.
I think both methodologies are flawed, but hiring via ignoring raw ability and picking people based off of weirdly specific cultural preferences is the worse of the two hiring methodologies.
Put it this way. If a company has a strong testing culture, then engineers who don't typically test things will adapt. It's not hard to do, and testing isn't so annoying that they won't do it.
> Docker is not the problem. Docker or virtual machines makes this problem more amenable to a solution, but even using docker here with testing is overkill and hacky. A take home should not expect a user to build excessive infrastructure locally just to run tests.
I don't think these statements make sense -- having docker installed and having redis installed are basically equivalent work. At the end of the day, the outcome is the same -- the developer is capable of running redis locally. Having redis installed on your local machine is absolutely within range for a backend developer.
Also, remote development is not practiced by many companies -- the only companies I've seen doing thin-clients that are large.
> Maybe for this take home project you could be right. I could do some integration tests by spinning up docker-compose from within python. Hacky but doable. But in general this solution is not production scalable as production involves more things then what can be placed inside a docker-compose.
I see it as just spinning up docker, not compose -- you already have access to the app (ex. if it was buildable via a function) so you could spawn redis in a subprocess (or container) on a random port, and then spawn the app.
I agree that it is not trivial, but the value is high (in mymind.
> Yeah it's 2023, you tell me how integration testing should be done as easily as unit testing on a takehome. I had one takehome project involving S3 and aws lambdas. They expected me to get an AWS account and literally set up infrastructure because there's no way around even testing what I wrote without actual infrastructure. That entire project was just integration test nightmare. Much rather run asserts locally.
I agree that integration testing is harder -- I think there's more value there.
Also, for replicating S3, minio (https://github.com/minio/minio) is a good stand-in. For replicating lambda, localstack (https://docs.localstack.cloud/user-guide/aws/lambda/) is probably reasonable there's also frameworks with some consideration for this (https://www.serverless.com/framework/docs/providers/aws/guid...) built in.
That said I do think that's a weakness of the platform compute stuff -- it is inconvenient to test lambda outside of lambda.
> Well I mean 99% of hires get into a context where they aren't in full control. So why is it logical to test what a developer will do if he/she had full control? Isn't it better to test the developers ability to code and to adapt to different contexts? Your methodology sort of just tests the programmers personal philosophy and whether it matches your own. That was my point.
Ah, this is true -- but I think this is what people are testing in interviews. There is a predominant culture/shared values, and the test is literally whether someone can fit into those values.
Whether they should or should not be, that's at least partially what interviews are -- does the new team member feel the same way about technical culture currently shared by the team.
Now in the case of this interview your solution was just fine, even excellent (because you went out of your way to do async io, use newer/easier packaging methodologies, etc), but it's clearly not just that.
-
What's the best AWS S3 protocol alternative?
Maybe Minio: https://github.com/minio/minio / https://min.io
I've only used it as a fairly straight forward object store though, so not sure about privileges/permissions (etc).
You say protocol alternative, but assuming you're more concerned with AWS as the host than S3 as the protocol you might try https://github.com/minio/minio
If you do feel an aversion to the protocol then the rclone backend list would be a good starting point
-
Reason to use other Build Tool than Make?
You could refer to big OSS project Makefiles to take a look, what could be there, for example: https://github.com/minio/minio/blob/master/Makefile
-
Looking for a Backblaze B2 compatible cloud backup application for Linux that uses standard file level (not block level) ZIP encryption (and with GUI would be nice).
Backblaze's B2 is compatible with AWS S3 that also implemented in selfhosted minio
- 求推荐一个能用脚本或者API上传更新文件的网盘
-
Selfhosted file share requiring authorised URL to upload
https://github.com/minio/minio is what comes to my mind
Seaweed File System
- An open-source distributed object storage service
-
Moving to github.com/seaweedfs/seaweedfs
FYI: Planning to move from github.com/chrislusf/seaweedfs to github.com/seaweedfs/seaweedfs in the coming days. It may cause some problem for package reference, building, documents, and links. Sorry for the change!
-
S3 Isn't Getting Cheaper
Besides storage itself, S3 API access cost can be high if frequently accessed. And latency is unpredicatble.
You can use SeaweedFS Remote Object Store Gateway to cache S3 (or any S3 API compatible vendors) to local servers, and access them at local network speed, and asynchronously sync back to S3.
https://github.com/chrislusf/seaweedfs/wiki/Gateway-to-Remot...
SeaweedFS: https://github.com/chrislusf/seaweedfs
-
Question: does anyone know Storage Provider with S3 as persistence layer?
I don't know if it fits all of your requests, but you can take a look at seaweedfs, which is pretty good
-
Introducing Garage, our self-hosted distributed object storage solution
Seaweedfs deserves a mention here for comparison as well.
-
Garage, our self-hosted distributed object storage solution
If you're still talking about SeaweedFS, the answer seems to simply be that it's not a "raft-based object store" as the parent described. That 'proxy' node you mention is a volume server itself, and replicates it's whole volume on another server. Upon replication failures, the data becomes read-only [1]. Raft is not used for the writes.
-
Updated MinIO NVMe Benchmarks: 2.6Tpbs on Get and 1.6 on Put
For computers, batch IO operations are much faster than random IO and can easily saturate the network.
This benchmark uses large batch size, 64MB, to test. There is nothing new here. Most common file systems can easily do the same.
The difficult task is to read and write lots of small files. There is a term for it, LOSF. I work on SeaweedFS, https://github.com/chrislusf/seaweedfs , which is designed to handle LOSF. And of course, no problem with large files at all.
This is a fair complaint. :)
For filer metadata, you should just pick the one you are most familiar with.
There is a wiki page for production setup. https://github.com/chrislusf/seaweedfs/wiki/Production-Setup
What are some alternatives?
Nextcloud - ☁️ Nextcloud server, a safe home for all your data
GlusterFS - Gluster Filesystem : Build your distributed storage in minutes
Ceph - Ceph is a distributed object, block, and file storage platform
Samba - https://gitlab.com/samba-team/samba is the Official GitLab mirror of https://git.samba.org/samba.git -- Merge requests should be made on GitLab (not on GitHub)
seaweedfs - SeaweedFS is a fast distributed storage system for blobs, objects, files, and data lake, for billions of files! Blob store has O(1) disk seek, cloud tiering. Filer supports Cloud Drive, cross-DC active-active replication, Kubernetes, POSIX FUSE mount, S3 API, S3 Gateway, Hadoop, WebDAV, encryption, Erasure Coding.
Swift - OpenStack Storage (Swift). Mirror of code maintained at opendev.org.
GlusterFS - Web Content for gluster.org -- Deprecated as of September 2017
SFTPGo - Fully featured and highly configurable SFTP server with optional HTTP/S, FTP/S and WebDAV support - S3, Google Cloud Storage, Azure Blob
etcd - Distributed reliable key-value store for the most critical data of a distributed system
Go IPFS - IPFS implementation in Go [Moved to: https://github.com/ipfs/kubo]
Seafile - High performance file syncing and sharing, with also Markdown WYSIWYG editing, Wiki, file label and other knowledge management features.
Monsta FTP - Open source PHP/Ajax cloudware that puts FTP file management right in your browser, anywhere, any time.