Your projects are multi-language. So is SonarQube analysis. Find Bugs, Vulnerabilities, Security Hotspots, and Code Smells so you can release quality code every time. Get started analyzing your projects today for free. Learn more →
Top 23 Go Data Science Projects
Go language library for reading and writing Microsoft Excel™ (XLAM / XLSM / XLSX / XLTM / XLTX) spreadsheetsProject mention: Excelize 2.7.0 Released – Go language API for spreadsheet (Excel) document | reddit.com/r/golang | 2023-01-08
Documentation website with multilingual: Arabic, German, Spanish, English, French, Russian, Chinese, Japanese, and Korean, which has been updated
The Go+ programming language is designed for engineering, STEM education, and data science.Project mention: Rob Pike's simple C regex matcher in Go | news.ycombinator.com | 2022-08-12
> That said, I hope someday Go adds the "?" return-operator
Same here. I think this is my biggest code-reading pain point as a go developer. I'm toying with the idea of playing more with Go+
Access the most powerful time series database as a service. Ingest, store, & analyze all types of time series data in a fully-managed, purpose-built database. Keep data forever with low-cost storage and superior data compression.
Data-Centric Pipelines and Data VersioningProject mention: Show HN: We scaled Git to support 1 TB repos | news.ycombinator.com | 2022-12-13
There are a couple of other contenders in this space. DVC (https://dvc.org/) seems most similar.
If you're interested in something you can self-host... I work on Pachyderm (https://github.com/pachyderm/pachyderm), which doesn't have a Git-like interface, but also implements data versioning. Our approach de-duplicates between files (even very small files), and our storage algorithm doesn't create objects proportional to O(n) directory nesting depth as Xet appears to. (Xet is very much like Git in that respect.)
The data versioning system enables us to run pipelines based on changes to your data; the pipelines declare what files they read, and that allows us to schedule processing jobs that only reprocess new or changed data, while still giving you a full view of what "would" have happened if all the data had been reprocessed. This, to me, is the key advantage of data versioning; you can save hundreds of thousands of dollars on compute. Being able to undo an oopsie is just icing on the cake.
Xet's system for mounting a remote repo as a filesystem is a good idea. We do that too :)
The Go kernel for Jupyter notebooks and nteract.Project mention: GoNB, a new Jupyter Notebook Kernel for Go | reddit.com/r/golang | 2023-02-09
I started this because gophernotes was not working for another project I'm slowing working on -- it is interpreted, and not up-to-date (generics, etc).
Interactive Go programming with Jupyter
🏕️ Reproducible development environment for AI/MLProject mention: Kubernetes for Data Science with Kubeflow | reddit.com/r/programming | 2022-12-22
The Virtual Feature Store. Turn your existing data infrastructure into a feature store.Project mention: What’s your process for deploying a data pipeline from a notebook, running it, and managing it in production? | reddit.com/r/dataengineering | 2022-10-13
Feature store: new hot one: https://www.featureform.com/
Static code analysis for 29 languages.. Your projects are multi-language. So is SonarQube analysis. Find Bugs, Vulnerabilities, Security Hotspots, and Code Smells so you can release quality code every time. Get started analyzing your projects today for free.
DataFrames for Go: For statistics, machine-learning, and data manipulation/exploration
A language and runtime for distributed, incremental data processing in the cloud
A high-performance, arbitrary-precision, floating-point decimal library. (by ericlagergren)
The easiest way to run ML on any cloud infrastructure (by aqueducthq)Project mention: Aqueduct: Take Data Science to Production | news.ycombinator.com | 2022-10-19
We've been working on making data teams more productive with Aqueduct for over a year, and we're really excited to share what we've been building.
There's a large (and growing!) number of programmers in the world who understand data and can solve business problems but don't want to spend their time wrangling low-level cloud infrastructure to get their work into the cloud. The existing MLOps tools that claim to solve this problem have been built by & for software teams, and they're incredibly complicated.
With Aqueduct, we've built a tool that's designed for data teams and abstracts away the underlying infrastructure. Aqueduct has a simple Python API that allows you to define a workflow as a composition of Python functions. Those workflows can be easily connected to data sources and can be run anywhere from your laptop to a Kubernetes cluster in the cloud. Once a workflow's running, Aqueduct has lightweight hooks to compute metrics and run tests over your pipelines to ensure they're producing high-quality results.
To learn more about what we're building, check out our GitHub repo or join our community Slack:
A High-level Machine Learning Library for GoProject mention: Goro: A High-level Machine Learning Library for Go | reddit.com/r/golang | 2022-07-31
Immutable data frame for Go
☁️ Terraform plugin for machine learning workloads: spot instance recovery & auto-termination | AWS, GCP, Azure, KubernetesProject mention: Using Ansible to create Deep Learning VM | reddit.com/r/mlops | 2023-01-22
You should look at terraform, and at the provider from iterative, the guys begind DVC, https://registry.terraform.io/providers/iterative/iterative/latest/docs
Compute over Data framework for public, transparent, and optionally verifiable computationProject mention: GitHub | reddit.com/r/BacalhauProject | 2023-03-27
A lightweight CLI tool for versioning data alongside source code and building data pipelines.Project mention: 🐂 🌾 Oxen.ai - Blazing Fast Unstructured Data Version Control, built in Rust | reddit.com/r/rust | 2023-02-16
There is also https://github.com/kevin-hanselman/dud
Dataplane is a data platform that makes it easy to construct a data mesh with automated data pipelines and workflows.Project mention: Airflow VS dataplane - a user suggested alternative | libhunt.com/r/airflow | 2022-05-03
Dataplane is an Airflow inspired data platform to automate, schedule and design data pipelines and workflows written in Golang.
Beneath is a serverless real-time data platform ⚡️
The Go/Go+ InterpreterProject mention: Can Go run statements in cmd like Python? | reddit.com/r/golang | 2023-03-16
If it’s REPL goplus had one https://github.com/goplus/igop
rtdl makes it easy to build and maintain a real-time data lake (by realtimedatalake)
Library for multi-armed bandit selection strategies, including efficient deterministic implementations of Thompson sampling and epsilon-greedy.
Here's an example of multi-armed bandits done in Go: https://github.com/stitchfix/mab
Go-Notebook is inspired by Jupyter Project (link) in order to document Golang code.
Ordered-concurrently a library for concurrent processing with ordered output in Go. Process work concurrently and returns output in a channel in the order of input. It is useful in concurrently processing items in a queue, and get output in the order provided by the queue.
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
Go Data Science related posts
Oxen.ai: Fast Unstructured Data Version Control
6 projects | news.ycombinator.com | 16 Feb 2023
🐂 🌾 Oxen.ai - Blazing Fast Unstructured Data Version Control, built in Rust
5 projects | reddit.com/r/rust | 16 Feb 2023
Deploying ML models straight from Jupyter Notebooks
2 projects | dev.to | 26 Jan 2023
Using Ansible to create Deep Learning VM
2 projects | reddit.com/r/mlops | 22 Jan 2023
Just made my first ever (useful) Python program. Would it be better/easier to make it as a web app, or a GUI?
2 projects | reddit.com/r/learnpython | 18 Dec 2022
Show HN: Reproducible development environments using starlark, without Nix
1 project | news.ycombinator.com | 24 Nov 2022
Kubeflow, Jupyter notebook online. Question to community [D]
1 project | reddit.com/r/MachineLearning | 25 Oct 2022
A note from our sponsor - SonarQube
www.sonarqube.org | 29 Mar 2023
What are some of the best open-source Data Science projects in Go? This list will help you: