Go Data Science

Open-source Go projects categorized as Data Science

Top 23 Go Data Science Projects

  • excelize

    Go language library for reading and writing Microsoft Excel™ (XLAM / XLSM / XLSX / XLTM / XLTX) spreadsheets

    Project mention: Excelize 2.7.0 Released – Go language API for spreadsheet (Excel) document | reddit.com/r/golang | 2023-01-08

    Documentation website with multilingual: Arabic, German, Spanish, English, French, Russian, Chinese, Japanese, and Korean, which has been updated

  • gop

    The Go+ programming language is designed for engineering, STEM education, and data science.

    Project mention: Rob Pike's simple C regex matcher in Go | news.ycombinator.com | 2022-08-12

    > That said, I hope someday Go adds the "?" return-operator

    Same here. I think this is my biggest code-reading pain point as a go developer. I'm toying with the idea of playing more with Go+

    https://github.com/goplus/gop/blob/main/doc/docs.md#error-ha...

  • InfluxDB

    Access the most powerful time series database as a service. Ingest, store, & analyze all types of time series data in a fully-managed, purpose-built database. Keep data forever with low-cost storage and superior data compression.

  • pachyderm

    Data-Centric Pipelines and Data Versioning

    Project mention: Show HN: We scaled Git to support 1 TB repos | news.ycombinator.com | 2022-12-13

    There are a couple of other contenders in this space. DVC (https://dvc.org/) seems most similar.

    If you're interested in something you can self-host... I work on Pachyderm (https://github.com/pachyderm/pachyderm), which doesn't have a Git-like interface, but also implements data versioning. Our approach de-duplicates between files (even very small files), and our storage algorithm doesn't create objects proportional to O(n) directory nesting depth as Xet appears to. (Xet is very much like Git in that respect.)

    The data versioning system enables us to run pipelines based on changes to your data; the pipelines declare what files they read, and that allows us to schedule processing jobs that only reprocess new or changed data, while still giving you a full view of what "would" have happened if all the data had been reprocessed. This, to me, is the key advantage of data versioning; you can save hundreds of thousands of dollars on compute. Being able to undo an oopsie is just icing on the cake.

    Xet's system for mounting a remote repo as a filesystem is a good idea. We do that too :)

  • gophernotes

    The Go kernel for Jupyter notebooks and nteract.

    Project mention: GoNB, a new Jupyter Notebook Kernel for Go | reddit.com/r/golang | 2023-02-09

    I started this because gophernotes was not working for another project I'm slowing working on -- it is interpreted, and not up-to-date (generics, etc).

  • lgo

    Interactive Go programming with Jupyter

  • envd

    🏕️ Reproducible development environment for AI/ML

    Project mention: Kubernetes for Data Science with Kubeflow | reddit.com/r/programming | 2022-12-22
  • featureform

    The Virtual Feature Store. Turn your existing data infrastructure into a feature store.

    Project mention: What’s your process for deploying a data pipeline from a notebook, running it, and managing it in production? | reddit.com/r/dataengineering | 2022-10-13

    Feature store: new hot one: https://www.featureform.com/

  • SonarQube

    Static code analysis for 29 languages.. Your projects are multi-language. So is SonarQube analysis. Find Bugs, Vulnerabilities, Security Hotspots, and Code Smells so you can release quality code every time. Get started analyzing your projects today for free.

  • dataframe-go

    DataFrames for Go: For statistics, machine-learning, and data manipulation/exploration

    Project mention: Machine Learning | reddit.com/r/golang | 2023-02-06
  • reflow

    A language and runtime for distributed, incremental data processing in the cloud

  • decimal

    A high-performance, arbitrary-precision, floating-point decimal library. (by ericlagergren)

  • aqueduct

    The easiest way to run ML on any cloud infrastructure (by aqueducthq)

    Project mention: Aqueduct: Take Data Science to Production | news.ycombinator.com | 2022-10-19

    Hi everyone!

    We've been working on making data teams more productive with Aqueduct for over a year, and we're really excited to share what we've been building.

    There's a large (and growing!) number of programmers in the world who understand data and can solve business problems but don't want to spend their time wrangling low-level cloud infrastructure to get their work into the cloud. The existing MLOps tools that claim to solve this problem have been built by & for software teams, and they're incredibly complicated.

    With Aqueduct, we've built a tool that's designed for data teams and abstracts away the underlying infrastructure. Aqueduct has a simple Python API that allows you to define a workflow as a composition of Python functions. Those workflows can be easily connected to data sources and can be run anywhere from your laptop to a Kubernetes cluster in the cloud. Once a workflow's running, Aqueduct has lightweight hooks to compute metrics and run tests over your pipelines to ensure they're producing high-quality results.

    To learn more about what we're building, check out our GitHub repo or join our community Slack:

    https://github.com/aqueducthq/aqueduct

  • goro

    A High-level Machine Learning Library for Go

    Project mention: Goro: A High-level Machine Learning Library for Go | reddit.com/r/golang | 2022-07-31
  • qframe

    Immutable data frame for Go

  • terraform-provider-iterative

    ☁️ Terraform plugin for machine learning workloads: spot instance recovery & auto-termination | AWS, GCP, Azure, Kubernetes

    Project mention: Using Ansible to create Deep Learning VM | reddit.com/r/mlops | 2023-01-22

    You should look at terraform, and at the provider from iterative, the guys begind DVC, https://registry.terraform.io/providers/iterative/iterative/latest/docs

  • bacalhau

    Compute over Data framework for public, transparent, and optionally verifiable computation

    Project mention: GitHub | reddit.com/r/BacalhauProject | 2023-03-27
  • dud

    A lightweight CLI tool for versioning data alongside source code and building data pipelines.

    Project mention: 🐂 🌾 Oxen.ai - Blazing Fast Unstructured Data Version Control, built in Rust | reddit.com/r/rust | 2023-02-16

    There is also https://github.com/kevin-hanselman/dud

  • Dataplane

    Dataplane is a data platform that makes it easy to construct a data mesh with automated data pipelines and workflows.

    Project mention: Airflow VS dataplane - a user suggested alternative | libhunt.com/r/airflow | 2022-05-03

    Dataplane is an Airflow inspired data platform to automate, schedule and design data pipelines and workflows written in Golang.

  • beneath

    Beneath is a serverless real-time data platform ⚡️

  • igop

    The Go/Go+ Interpreter

    Project mention: Can Go run statements in cmd like Python? | reddit.com/r/golang | 2023-03-16

    If it’s REPL goplus had one https://github.com/goplus/igop

  • rtdl

    rtdl makes it easy to build and maintain a real-time data lake (by realtimedatalake)

  • mab

    Library for multi-armed bandit selection strategies, including efficient deterministic implementations of Thompson sampling and epsilon-greedy.

    Project mention: Machine Learning | reddit.com/r/golang | 2023-02-06

    Here's an example of multi-armed bandits done in Go: https://github.com/stitchfix/mab

  • go-notebook

    Go-Notebook is inspired by Jupyter Project (link) in order to document Golang code.

  • ordered-concurrently

    Ordered-concurrently a library for concurrent processing with ordered output in Go. Process work concurrently and returns output in a channel in the order of input. It is useful in concurrently processing items in a queue, and get output in the order provided by the queue.

  • SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020). The latest post mention was on 2023-03-27.

Go Data Science related posts

Index

What are some of the best open-source Data Science projects in Go? This list will help you:

Project Stars
1 excelize 14,635
2 gop 8,419
3 pachyderm 5,867
4 gophernotes 3,516
5 lgo 2,284
6 envd 1,548
7 featureform 1,241
8 dataframe-go 962
9 reflow 923
10 decimal 454
11 aqueduct 409
12 goro 353
13 qframe 353
14 terraform-provider-iterative 275
15 bacalhau 256
16 dud 129
17 Dataplane 124
18 beneath 76
19 igop 74
20 rtdl 41
21 mab 33
22 go-notebook 33
23 ordered-concurrently 22
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com