Go Bigdata

Open-source Go projects categorized as Bigdata | Edit details

Top 4 Go Bigdata Projects

  • GitHub repo volcano

    A Cloud Native Batch System (Project under CNCF)

    Project mention: My Journey With Spark On Kubernetes... In Python (1/3) | dev.to | 2021-04-12

    For our experiments, we will use Volcano which is a batch scheduler for Kubernetes, well-suited for scheduling Spark applications pods with a better efficiency than the default kube-scheduler. The main reason is that Volcano allows "group scheduling" or "gang scheduling": while the default scheduler of Kubernetes schedules containers one by one, Volcano ensures that a gang of related containers (here, the Spark driver and its executors) can be scheduled at the same time. If for any reason it is not possible to deploy all the containers in a gang, Volcano will not schedule that gang. This article explains in more detail the reasons for using Volcano.

  • GitHub repo kube-batch

    A batch scheduler of kubernetes for high performance workload, e.g. AI/ML, BigData, HPC

    Project mention: Scaling Kubernetes to 7,500 Nodes | news.ycombinator.com | 2021-01-25

    > That said, strain on the kube-scheduler is spiky. A new job may consist of many hundreds of pods all being created at once, then return to a relatively low rate of churn.

    Last I checked, the default scheduler places Pods one at a time. It might be advantageous to use a gang/batch scheduler like kube-batch[0], Poseidon[1] or DCM[2].

    [0] https://github.com/kubernetes-sigs/kube-batch

    [1] https://github.com/kubernetes-sigs/poseidon

    [2] https://github.com/vmware/declarative-cluster-management

  • Nanos

    Run Linux Software Faster and Safer than Linux with Unikernels.

  • GitHub repo cds

    Data syncing in golang for ClickHouse. (by zeromicro)

    Project mention: ClickHouse, Inc | news.ycombinator.com | 2021-09-20
  • GitHub repo sidekick

    High Performance HTTP Sidecar Load Balancer (by minio)

    Project mention: Node failure in a distributed minio cluster | reddit.com/r/minio | 2021-03-11

    Sidekick would be the recommended route (https://github.com/minio/sidekick). Other load balancers will work but can struggle with high I/O workloads and Sidekick is also incredibly simple to set up so that is a plus as well. Outside of some sort of load balancer, some people elect for RRDNS but that doesn't take nodes out of the available pool so isn't used as often.

NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020). The latest post mention was on 2021-09-20.

Index

What are some of the best open-source Bigdata projects in Go? This list will help you:

Project Stars
1 volcano 2,088
2 kube-batch 894
3 cds 707
4 sidekick 406
Find remote Bigdata jobs at our new job board 99remotejobs.com. There is 1 new remote job listed recently.
Are you hiring? Post a new remote job listing for free.
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com