|about 18 hours ago||about 10 hours ago|
|GNU General Public License v3.0 or later||GNU General Public License v3.0 or later|
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
Pokemon vs Programming
1 project | reddit.com/r/WorkReform | 12 Apr 2022
Big Data Processing, EMR with Spark and Hadoop | Python, PySpark
2 projects | dev.to | 27 Mar 2022
Apache Hadoop is an open source framework that is used to efficiently store and process large datasets ranging in size from gigabytes to petabytes of data.Wanna dig more dipper?
Unknown Python.exe process taking 2% CPU
1 project | reddit.com/r/WindowsHelp | 2 Feb 2022
Few related projects too it on the side of the page here that might be familiar https://hadoop.apache.org/
How do I make multiple computers run as one?
1 project | reddit.com/r/techsupport | 6 Jan 2022
The computers that you have appear to use an x86 architecture. Therefore, you could most likely install a Linux distro on each one. Then, you could use something like Apache Hadoop to execute some sort of distributed process across each computer.
Spark for beginners - and you
3 projects | dev.to | 22 Dec 2021
Hadoop is an ecosystem of tools for big data storage and data analysis. It is older than Spark and writes intermediate results to disk whereas Spark tires to keep data in memory whenever possible, so this is faster in many use cases.
Dreaming and Breaking Molds – Establishing Best Practices with Scott Haines
3 projects | dev.to | 8 Dec 2021
So Yahoo bought that. I think it was 2013 or 2014. Timelines are hard. But I wanted to go join the Games team and start things back up. But that was also my first kind of experience in actually building recommendation engines or working with lots of data. And I think for me, like that was, I guess...at the time, we were using something called Apache Storm. We had Hadoop, which had been around for a while. And it was like one of the biggest user groups was out of the Yahoo campus. It was called the HUG group, like the Hadoop Users Group. So they met for basically pizza and stories on Wednesdays once a month, which was really fun.
Setting up a single-node Hadoop cluster
1 project | dev.to | 14 Nov 2021
Spark is lit once again
6 projects | dev.to | 29 Oct 2021
Here at Exacaster Spark applications have been used extensively for years. We started using them on our Hadoop clusters with YARN as an application manager. However, with our recent product, we started moving towards a Cloud-based solution and decided to use Kubernetes for our infrastructure needs.
The Data Engineer Roadmap 🗺
11 projects | dev.to | 19 Oct 2021
Apache Hadoop and HDFS
Whatever java can do there is a better alternative in job market?
2 projects | reddit.com/r/learnjava | 16 Jul 2021
How to figure out what is using up all of the storage?
1 project | reddit.com/r/ceph | 10 May 2022
Grafana in ceph dashboard
1 project | reddit.com/r/ceph | 3 May 2022
Trouble mounting a 2nd ceph file system on linux
1 project | reddit.com/r/ceph | 15 Apr 2022
Perhaps there was something wrong in the documentation? Looking at code lines 193 through 196 here, could it possibly be expecting :/ instead of =/ in the mount path? Even though the documentation states to use an equal sign?
How do cloud provider run their user frontend (and backend)?
1 project | reddit.com/r/sysadmin | 5 Mar 2022
You also need a storage system. Things like Ceph.
Long term Ceph experiences?
1 project | reddit.com/r/Veeam | 26 Feb 2022
Yes it is fixed in the latest release and and I know the back port has at least been submitted. https://github.com/ceph/ceph/pull/44697
cephadm: update fewer OSDs at a time?
1 project | reddit.com/r/ceph | 14 Feb 2022
There are a couple of things going on perhaps I can shed some light on: 1 bug was introduced in a recent update WRT read_leases - I believe this is the PR that fixes it but I'm not 100% sure: https://github.com/ceph/ceph/pull/44015
Recommended open source distributed AirGapped storage solution (Object storage, Block device)
1 project | reddit.com/r/AirGapped | 1 Jan 2022
We run Ceph in our production environment for both large capacity object storage workloads and for performant high IOP database block storage.
Recovery options for OMAP conversion bug
1 project | reddit.com/r/ceph | 10 Dec 2021
I think it's related to one issue linked to mon ( https://github.com/ceph/ceph/pull/44131#pullrequestreview-827958267 ) and nfs ( https://github.com/ceph/ceph/pull/44252 ). Nothing serious, but better wait few days and keep an eye on ML, and changelog PR ( https://github.com/ceph/ceph/pull/44131 ) before upgrading a cluster.
Ceph's MON minimum requirements
1 project | reddit.com/r/ceph | 4 Oct 2021
Pacific on Ubuntu 20.04
1 project | reddit.com/r/ceph | 4 Jun 2021
But that's what I did. curl --silent --remote-name --location https://github.com/ceph/ceph/raw/pacific/src/cephadm/cephadm
What are some alternatives?
Seaweed File System - SeaweedFS is a fast distributed storage system for blobs, objects, files, and data lake, for billions of files! Blob store has O(1) disk seek, cloud tiering. Filer supports Cloud Drive, cross-DC active-active replication, Kubernetes, POSIX FUSE mount, S3 API, S3 Gateway, Hadoop, WebDAV, encryption, Erasure Coding.
MooseFS - MooseFS – Open Source, Petabyte, Fault-Tolerant, Highly Performing, Scalable Network Distributed File System (Software-Defined Storage)
Go IPFS - IPFS implementation in Go
LeoFS - The LeoFS Storage System
lizardfs - LizardFS is an Open Source Distributed File System licensed under GPLv3.
OpenAFS - Fork of OpenAFS from git.openafs.org for visualization
XtreemFS - Distributed Fault-Tolerant File System
GlusterFS - Web Content for gluster.org -- Deprecated as of September 2017