InfluxDB Platform is powered by columnar analytics, optimized for cost-efficient storage, and built with open data standards. Learn more →
Top 23 Go Data Science Projects
-
excelize
Go language library for reading and writing Microsoft Excel™ (XLAM / XLSM / XLSX / XLTM / XLTX) spreadsheets
-
InfluxDB
Purpose built for real-time analytics at any scale. InfluxDB Platform is powered by columnar analytics, optimized for cost-efficient storage, and built with open data standards.
-
gop
The Go+ programming language is designed for engineering, STEM education, and data science. Our vision is to enable everyone to become a builder of the future.
https://github.com/goplus/gop, but they go slightly too overboard imo.
-
20. Pachyderm | Github | tutorial
-
flyte
Scalable and flexible workflow orchestration platform that seamlessly unifies data, ML and analytics stacks.
My $0.02: https://flyte.org/ - you write the python functions, they take an s3 (or similar) path to the images, and flute handles the orchestration for you, also allowing you to control how much compute is thrown at the problem, which essentially gives you your queue.
If cost of operations starts to be an issue you can start moving elements to your own infrastructure.
-
https://github.com/gopherdata/gophernotes
I've had this bookmarked for some time and just havent gotten around to it.
-
determined
Determined is an open-source machine learning platform that simplifies distributed training, hyperparameter tuning, experiment tracking, and resource management. Works with PyTorch and TensorFlow.
17. Determined AI | Github | tutorial
-
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
-
-
-
-
This is really interesting - we’ve tried really hard to solve some of these with Bacalhau[1] - a much simpler distributed compute platform. Would love your feedback!
[1] https://github.com/bacalhau-project/bacalhau
Disclosure: I confounded Bacalhau
-
Project mention: Go, Python, Rust, and production AI applications | news.ycombinator.com | 2024-03-12
I've had these strong feelings and the OP describes it really well. Despite being a polyglot programmer, I really struggle with Python, both in expression and performance (unless it's just config for GPUs).
Some of this frustration was recently an "Unpopular Opinion" on the Go Time Podcast regarding Python being great for "data exploration" but not for "data engineering": https://changelog.com/gotime/304#t=3196
I've been yearning for better interactive tooling and ML-related libraries bridge this gap and started using some even in just the last week:
* GoNB (Golang-support for Jupyter notebooks, also from a Googler) https://github.com/janpfeifer/gonb
* That uses Go-Plotly for graphs/UI: https://github.com/MetalBlueberry/go-plotly
* GoMLX (GoNB author is also on that project, many thanks Jan!) https://github.com/gomlx/gomlx
* Hidden at the end of OP is LangChainGo for LLMs, which I haven't used yet: https://github.com/tmc/langchaingo
Pick those up and let's make the Go community stronger together!
-
aqueduct
Aqueduct is no longer being maintained. Aqueduct allows you to run LLM and ML workloads on any cloud infrastructure. (by RunLLM)
-
-
-
-
-
terraform-provider-iterative
☁️ Terraform plugin for machine learning workloads: spot instance recovery & auto-termination | AWS, GCP, Azure, Kubernetes
-
Dataplane
Dataplane is a data platform that makes it easy to construct a data mesh with automated data pipelines and workflows.
-
Project mention: Ask HN: How do your ML teams version datasets and models? | news.ycombinator.com | 2023-09-28
I've used DVC in the past and generally liked its approach. That said, I wholeheartedly agree that it's clunky. It does a lot of things implicitly, which can make it hard to reason about. It was also extremely slow for medium-sized dataset (low 10s of GBs).
In response, I created a command-line tool that addresses these issues[0]. To reduce the comparison to an analogy: Dud : DVC :: Flask : Django.
[0]: https://github.com/kevin-hanselman/dud
-
-
-
Project mention: Experiment tracking server focused on speed and scalability | news.ycombinator.com | 2024-08-30
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
Go Data Science discussion
Go Data Science related posts
-
Frawk: An efficient Awk-like programming language. (2021)
-
Go Enums Suck
-
Fix: Hong Kong is not in China
-
Why bad scientific code beats code following "best practices"
-
Jupyter Lab Extension to run your GPU-heavy stuff (for free for now) on somebody's else server without blocking yours
-
Fix: Hong Kong locale does not always mean China
-
packages similar to Pandas
-
A note from our sponsor - InfluxDB
www.influxdata.com | 7 Sep 2024
Index
What are some of the best open-source Data Science projects in Go? This list will help you:
Project | Stars | |
---|---|---|
1 | excelize | 17,940 |
2 | gop | 8,915 |
3 | pachyderm | 6,138 |
4 | flyte | 5,385 |
5 | gophernotes | 3,819 |
6 | determined | 2,982 |
7 | lgo | 2,404 |
8 | dataframe-go | 1,144 |
9 | FlowMeter | 1,105 |
10 | reflow | 965 |
11 | bacalhau | 662 |
12 | gonb | 590 |
13 | aqueduct | 521 |
14 | decimal | 516 |
15 | qframe | 390 |
16 | goro | 371 |
17 | webpalm | 348 |
18 | terraform-provider-iterative | 288 |
19 | Dataplane | 210 |
20 | dud | 181 |
21 | wallet-tracker | 121 |
22 | igop | 110 |
23 | fasttrackml | 97 |