Go Big Data

Open-source Go projects categorized as Big Data

Top 4 Go Big Data Projects

  • pachyderm

    Data-Centric Pipelines and Data Versioning

    Project mention: Show HN: We scaled Git to support 1 TB repos | news.ycombinator.com | 2022-12-13

    There are a couple of other contenders in this space. DVC (https://dvc.org/) seems most similar.

    If you're interested in something you can self-host... I work on Pachyderm (https://github.com/pachyderm/pachyderm), which doesn't have a Git-like interface, but also implements data versioning. Our approach de-duplicates between files (even very small files), and our storage algorithm doesn't create objects proportional to O(n) directory nesting depth as Xet appears to. (Xet is very much like Git in that respect.)

    The data versioning system enables us to run pipelines based on changes to your data; the pipelines declare what files they read, and that allows us to schedule processing jobs that only reprocess new or changed data, while still giving you a full view of what "would" have happened if all the data had been reprocessed. This, to me, is the key advantage of data versioning; you can save hundreds of thousands of dollars on compute. Being able to undo an oopsie is just icing on the cake.

    Xet's system for mounting a remote repo as a filesystem is a good idea. We do that too :)

  • hazelcast-go-client

    Hazelcast Go Client

  • InfluxDB

    Access the most powerful time series database as a service. Ingest, store, & analyze all types of time series data in a fully-managed, purpose-built database. Keep data forever with low-cost storage and superior data compression.

  • lake

    DevLake: the open-source dev data platform & dashboard for your DevOps tools. *Note*: We have moved to Apache Software Foundation https://github.com/apache/incubator-devlake.

  • rtdl

    rtdl makes it easy to build and maintain a real-time data lake (by realtimedatalake)

NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020). The latest post mention was on 2022-12-13.

Go Big Data related posts


What are some of the best open-source Big Data projects in Go? This list will help you:

Project Stars
1 pachyderm 5,919
2 hazelcast-go-client 176
3 lake 96
4 rtdl 41
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives