Go Data Science

Open-source Go projects categorized as Data Science

Top 23 Go Data Science Projects

Data Science
  • excelize

    Go language library for reading and writing Microsoft Excel™ (XLAM / XLSM / XLSX / XLTM / XLTX) spreadsheets

  • InfluxDB

    Purpose built for real-time analytics at any scale. InfluxDB Platform is powered by columnar analytics, optimized for cost-efficient storage, and built with open data standards.

    InfluxDB logo
  • gop

    The Go+ programming language is designed for engineering, STEM education, and data science. Our vision is to enable everyone to become a builder of the future.

    Project mention: Go Enums Suck | news.ycombinator.com | 2024-03-01

    https://github.com/goplus/gop, but they go slightly too overboard imo.

  • pachyderm

    Data-Centric Pipelines and Data Versioning

    Project mention: Open Source Advent Fun Wraps Up! | dev.to | 2024-01-05

    20. Pachyderm | Github | tutorial

  • flyte

    Scalable and flexible workflow orchestration platform that seamlessly unifies data, ML and analytics stacks.

    Project mention: Ask HN: What's the right tool for this job? | news.ycombinator.com | 2024-07-20

    My $0.02: https://flyte.org/ - you write the python functions, they take an s3 (or similar) path to the images, and flute handles the orchestration for you, also allowing you to control how much compute is thrown at the problem, which essentially gives you your queue.

    If cost of operations starts to be an issue you can start moving elements to your own infrastructure.

  • gophernotes

    The Go kernel for Jupyter notebooks and nteract.

    Project mention: Go: What We Got Right, What We Got Wrong | news.ycombinator.com | 2024-01-04

    https://github.com/gopherdata/gophernotes

    I've had this bookmarked for some time and just havent gotten around to it.

  • determined

    Determined is an open-source machine learning platform that simplifies distributed training, hyperparameter tuning, experiment tracking, and resource management. Works with PyTorch and TensorFlow.

    Project mention: Open Source Advent Fun Wraps Up! | dev.to | 2024-01-05

    17. Determined AI | Github | tutorial

  • lgo

    Interactive Go programming with Jupyter

  • SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

    SaaSHub logo
  • dataframe-go

    DataFrames for Go: For statistics, machine-learning, and data manipulation/exploration

  • FlowMeter

    ⭐ ⭐ Use ML to classify flows and packets as benign or malicious. ⭐ ⭐

  • reflow

    A language and runtime for distributed, incremental data processing in the cloud

  • bacalhau

    Compute over Data framework for public, transparent, and optionally verifiable computation

    Project mention: Deno Cron | news.ycombinator.com | 2023-11-29

    This is really interesting - we’ve tried really hard to solve some of these with Bacalhau[1] - a much simpler distributed compute platform. Would love your feedback!

    [1] https://github.com/bacalhau-project/bacalhau

    Disclosure: I confounded Bacalhau

  • gonb

    GoNB, a Go Notebook Kernel for Jupyter

    Project mention: Go, Python, Rust, and production AI applications | news.ycombinator.com | 2024-03-12

    I've had these strong feelings and the OP describes it really well. Despite being a polyglot programmer, I really struggle with Python, both in expression and performance (unless it's just config for GPUs).

    Some of this frustration was recently an "Unpopular Opinion" on the Go Time Podcast regarding Python being great for "data exploration" but not for "data engineering": https://changelog.com/gotime/304#t=3196

    I've been yearning for better interactive tooling and ML-related libraries bridge this gap and started using some even in just the last week:

    * GoNB (Golang-support for Jupyter notebooks, also from a Googler) https://github.com/janpfeifer/gonb

    * That uses Go-Plotly for graphs/UI: https://github.com/MetalBlueberry/go-plotly

    * GoMLX (GoNB author is also on that project, many thanks Jan!) https://github.com/gomlx/gomlx

    * Hidden at the end of OP is LangChainGo for LLMs, which I haven't used yet: https://github.com/tmc/langchaingo

    Pick those up and let's make the Go community stronger together!

  • aqueduct

    Aqueduct is no longer being maintained. Aqueduct allows you to run LLM and ML workloads on any cloud infrastructure. (by RunLLM)

  • decimal

    A high-performance, arbitrary-precision, floating-point decimal library. (by ericlagergren)

  • qframe

    Immutable data frame for Go

  • goro

    A High-level Machine Learning Library for Go

  • webpalm

    🕸️ Crawl in the web network

    Project mention: Modern automated data miner (scrapper) | news.ycombinator.com | 2024-02-08
  • terraform-provider-iterative

    ☁️ Terraform plugin for machine learning workloads: spot instance recovery & auto-termination | AWS, GCP, Azure, Kubernetes

  • Dataplane

    Dataplane is a data platform that makes it easy to construct a data mesh with automated data pipelines and workflows.

  • dud

    A lightweight CLI tool for versioning data alongside source code and building data pipelines.

    Project mention: Ask HN: How do your ML teams version datasets and models? | news.ycombinator.com | 2023-09-28

    I've used DVC in the past and generally liked its approach. That said, I wholeheartedly agree that it's clunky. It does a lot of things implicitly, which can make it hard to reason about. It was also extremely slow for medium-sized dataset (low 10s of GBs).

    In response, I created a command-line tool that addresses these issues[0]. To reduce the comparison to an analogy: Dud : DVC :: Flask : Django.

    [0]: https://github.com/kevin-hanselman/dud

  • wallet-tracker

    Detect real scammers with Wallet-Tracker CLI from anywhere.

  • igop

    The Go/Go+ Interpreter

  • fasttrackml

    Experiment tracking server focused on speed and scalability

    Project mention: Experiment tracking server focused on speed and scalability | news.ycombinator.com | 2024-08-30
  • SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

    SaaSHub logo
NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020).

Go Data Science discussion

Log in or Post with

Go Data Science related posts

  • Frawk: An efficient Awk-like programming language. (2021)

    4 projects | news.ycombinator.com | 21 Apr 2024
  • Go Enums Suck

    1 project | news.ycombinator.com | 1 Mar 2024
  • Fix: Hong Kong is not in China

    1 project | news.ycombinator.com | 23 Jan 2024
  • Why bad scientific code beats code following "best practices"

    3 projects | news.ycombinator.com | 6 Jan 2024
  • Jupyter Lab Extension to run your GPU-heavy stuff (for free for now) on somebody's else server without blocking yours

    2 projects | /r/datascience | 22 Sep 2023
  • Fix: Hong Kong locale does not always mean China

    1 project | news.ycombinator.com | 21 Jul 2023
  • packages similar to Pandas

    2 projects | /r/golang | 10 May 2023
  • A note from our sponsor - InfluxDB
    www.influxdata.com | 7 Sep 2024
    InfluxDB Platform is powered by columnar analytics, optimized for cost-efficient storage, and built with open data standards. Learn more →

Index

What are some of the best open-source Data Science projects in Go? This list will help you:

Project Stars
1 excelize 17,940
2 gop 8,915
3 pachyderm 6,138
4 flyte 5,385
5 gophernotes 3,819
6 determined 2,982
7 lgo 2,404
8 dataframe-go 1,144
9 FlowMeter 1,105
10 reflow 965
11 bacalhau 662
12 gonb 590
13 aqueduct 521
14 decimal 516
15 qframe 390
16 goro 371
17 webpalm 348
18 terraform-provider-iterative 288
19 Dataplane 210
20 dud 181
21 wallet-tracker 121
22 igop 110
23 fasttrackml 97

Sponsored
Purpose built for real-time analytics at any scale.
InfluxDB Platform is powered by columnar analytics, optimized for cost-efficient storage, and built with open data standards.
www.influxdata.com