|about 18 hours ago||about 18 hours ago|
|GNU General Public License v3.0 or later||GNU General Public License v3.0 or later|
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
Pokemon vs Programming
1 project | reddit.com/r/WorkReform | 12 Apr 2022
Big Data Processing, EMR with Spark and Hadoop | Python, PySpark
2 projects | dev.to | 27 Mar 2022
Apache Hadoop is an open source framework that is used to efficiently store and process large datasets ranging in size from gigabytes to petabytes of data.Wanna dig more dipper?
Unknown Python.exe process taking 2% CPU
1 project | reddit.com/r/WindowsHelp | 2 Feb 2022
Few related projects too it on the side of the page here that might be familiar https://hadoop.apache.org/
How do I make multiple computers run as one?
1 project | reddit.com/r/techsupport | 6 Jan 2022
The computers that you have appear to use an x86 architecture. Therefore, you could most likely install a Linux distro on each one. Then, you could use something like Apache Hadoop to execute some sort of distributed process across each computer.
Spark for beginners - and you
3 projects | dev.to | 22 Dec 2021
Hadoop is an ecosystem of tools for big data storage and data analysis. It is older than Spark and writes intermediate results to disk whereas Spark tires to keep data in memory whenever possible, so this is faster in many use cases.
Dreaming and Breaking Molds – Establishing Best Practices with Scott Haines
3 projects | dev.to | 8 Dec 2021
So Yahoo bought that. I think it was 2013 or 2014. Timelines are hard. But I wanted to go join the Games team and start things back up. But that was also my first kind of experience in actually building recommendation engines or working with lots of data. And I think for me, like that was, I guess...at the time, we were using something called Apache Storm. We had Hadoop, which had been around for a while. And it was like one of the biggest user groups was out of the Yahoo campus. It was called the HUG group, like the Hadoop Users Group. So they met for basically pizza and stories on Wednesdays once a month, which was really fun.
Setting up a single-node Hadoop cluster
1 project | dev.to | 14 Nov 2021
Spark is lit once again
6 projects | dev.to | 29 Oct 2021
Here at Exacaster Spark applications have been used extensively for years. We started using them on our Hadoop clusters with YARN as an application manager. However, with our recent product, we started moving towards a Cloud-based solution and decided to use Kubernetes for our infrastructure needs.
The Data Engineer Roadmap 🗺
11 projects | dev.to | 19 Oct 2021
Apache Hadoop and HDFS
Whatever java can do there is a better alternative in job market?
2 projects | reddit.com/r/learnjava | 16 Jul 2021
Multiple DS units acting as one?
1 project | reddit.com/r/synology | 23 Mar 2022
What you look for is a clustered file system. Like https://www.gluster.org/. As long as all units are closeby with low latency there are a couple solutions that allow you to create distributed storage solutions of various kinds. Key value stores applenty, clustered file systems that pretent to be one file system etc. If you have geographically distributed solutions with high latencies it becomes harder. Most open source systems don't work really well in this scenario. There were a couple attempts like Hydrabase but they didn't go so far. It normally is solved by doing two clusters and then replicate between them.
Upload pdf file to mongodb atlas
1 project | reddit.com/r/mongodb | 21 Mar 2022
I'd imagine most managed service providers are going to require a credit card, though most of them have a free tier. If you want to take an unmanaged approach, maybe look into Gluster. I've used it before and never had issue with it, but I also had an infrastructure team that set it up, so I'm not familiar with the challenges that way: https://www.gluster.org/
Gluster 10 repo
1 project | reddit.com/r/gluster | 3 Feb 2022
Gluster is developed in Github at https://github.com/gluster/glusterfs/
Blocky DNS & synchronizing two instances (primary & secondary DNS)
2 projects | reddit.com/r/selfhosted | 17 Jan 2022
I'm running three Blocky instances in Docker (and CoreDNS for internal zone resolving) by placing YAML files on a GlusterFS share, so I can update configs on one VM, and then just restart Blocky containers via SSH.
Looking for cool home projects
1 project | reddit.com/r/sysadmin | 15 Jan 2022
I've you've got a few similar computers available (Like Pis, though not sure how well it runs on Arm) you could set up a little Gluster and show off how node failure and recovery affect a distributed file system.
Why are you not using kubernetes?
3 projects | reddit.com/r/selfhosted | 31 Dec 2021
Longhorn and storage in general the hardest part of any HA setup, but also not the only choice, at the most basic level something like glusterFS is easy to get running and usable in k8s as NFS volumes, it however doesn't have all the extra features of longhorn.
Did I screw up? HASS, Smartthings, Hue & HomeKit.. seeing duplicated devices :/
1 project | reddit.com/r/homeautomation | 29 Dec 2021
GlusterFS (distributed storage)
1 project | reddit.com/r/SolusProject | 19 Sep 2021
No, this package isn't available in Solus Repository. If you are going to compile it, the source code is here: https://github.com/gluster/glusterfs
HPC design choices
2 projects | reddit.com/r/HPC | 20 Apr 2021
Do you mean https://www.gluster.org/ ?
Info request - Building a NAS/SAN
1 project | reddit.com/r/linux4noobs | 17 Apr 2021
What are some alternatives?
minio - Multi-Cloud Object Storage
Go IPFS - IPFS implementation in Go
Ceph - Ceph is a distributed object, block, and file storage platform
Seaweed File System - SeaweedFS is a fast distributed storage system for blobs, objects, files, and data lake, for billions of files! Blob store has O(1) disk seek, cloud tiering. Filer supports Cloud Drive, cross-DC active-active replication, Kubernetes, POSIX FUSE mount, S3 API, S3 Gateway, Hadoop, WebDAV, encryption, Erasure Coding.
lizardfs - LizardFS is an Open Source Distributed File System licensed under GPLv3.
Tahoe-LAFS - The Tahoe-LAFS decentralized secure filesystem.
MooseFS - MooseFS – Open Source, Petabyte, Fault-Tolerant, Highly Performing, Scalable Network Distributed File System (Software-Defined Storage)
GlusterFS - Web Content for gluster.org -- Deprecated as of September 2017
btrfs - Haskell bindings to the btrfs API
Airflow - Apache Airflow - A platform to programmatically author, schedule, and monitor workflows