Your projects are multi-language. So is SonarQube analysis. Find Bugs, Vulnerabilities, Security Hotspots, and Code Smells so you can release quality code every time. Get started analyzing your projects today for free. Learn more →
Top 23 Go Data Science Projects
-
excelize
Go language library for reading and writing Microsoft Excel™ (XLAM / XLSM / XLSX / XLTM / XLTX) spreadsheets
Project mention: Excelize 2.7.0 Released – Go language API for spreadsheet (Excel) document | reddit.com/r/golang | 2023-01-08Documentation website with multilingual: Arabic, German, Spanish, English, French, Russian, Chinese, Japanese, and Korean, which has been updated
-
> That said, I hope someday Go adds the "?" return-operator
Same here. I think this is my biggest code-reading pain point as a go developer. I'm toying with the idea of playing more with Go+
https://github.com/goplus/gop/blob/main/doc/docs.md#error-ha...
-
InfluxDB
Access the most powerful time series database as a service. Ingest, store, & analyze all types of time series data in a fully-managed, purpose-built database. Keep data forever with low-cost storage and superior data compression.
-
There are a couple of other contenders in this space. DVC (https://dvc.org/) seems most similar.
If you're interested in something you can self-host... I work on Pachyderm (https://github.com/pachyderm/pachyderm), which doesn't have a Git-like interface, but also implements data versioning. Our approach de-duplicates between files (even very small files), and our storage algorithm doesn't create objects proportional to O(n) directory nesting depth as Xet appears to. (Xet is very much like Git in that respect.)
The data versioning system enables us to run pipelines based on changes to your data; the pipelines declare what files they read, and that allows us to schedule processing jobs that only reprocess new or changed data, while still giving you a full view of what "would" have happened if all the data had been reprocessed. This, to me, is the key advantage of data versioning; you can save hundreds of thousands of dollars on compute. Being able to undo an oopsie is just icing on the cake.
Xet's system for mounting a remote repo as a filesystem is a good idea. We do that too :)
-
I started this because gophernotes was not working for another project I'm slowing working on -- it is interpreted, and not up-to-date (generics, etc).
-
-
-
Project mention: What’s your process for deploying a data pipeline from a notebook, running it, and managing it in production? | reddit.com/r/dataengineering | 2022-10-13
Feature store: new hot one: https://www.featureform.com/
-
SonarQube
Static code analysis for 29 languages.. Your projects are multi-language. So is SonarQube analysis. Find Bugs, Vulnerabilities, Security Hotspots, and Code Smells so you can release quality code every time. Get started analyzing your projects today for free.
-
-
-
-
Hi everyone!
We've been working on making data teams more productive with Aqueduct for over a year, and we're really excited to share what we've been building.
There's a large (and growing!) number of programmers in the world who understand data and can solve business problems but don't want to spend their time wrangling low-level cloud infrastructure to get their work into the cloud. The existing MLOps tools that claim to solve this problem have been built by & for software teams, and they're incredibly complicated.
With Aqueduct, we've built a tool that's designed for data teams and abstracts away the underlying infrastructure. Aqueduct has a simple Python API that allows you to define a workflow as a composition of Python functions. Those workflows can be easily connected to data sources and can be run anywhere from your laptop to a Kubernetes cluster in the cloud. Once a workflow's running, Aqueduct has lightweight hooks to compute metrics and run tests over your pipelines to ensure they're producing high-quality results.
To learn more about what we're building, check out our GitHub repo or join our community Slack:
-
Project mention: Goro: A High-level Machine Learning Library for Go | reddit.com/r/golang | 2022-07-31
-
-
terraform-provider-iterative
☁️ Terraform plugin for machine learning workloads: spot instance recovery & auto-termination | AWS, GCP, Azure, Kubernetes
You should look at terraform, and at the provider from iterative, the guys begind DVC, https://registry.terraform.io/providers/iterative/iterative/latest/docs
-
-
Project mention: 🐂 🌾 Oxen.ai - Blazing Fast Unstructured Data Version Control, built in Rust | reddit.com/r/rust | 2023-02-16
There is also https://github.com/kevin-hanselman/dud
-
Dataplane
Dataplane is a data platform that makes it easy to construct a data mesh with automated data pipelines and workflows.
Project mention: Airflow VS dataplane - a user suggested alternative | libhunt.com/r/airflow | 2022-05-03Dataplane is an Airflow inspired data platform to automate, schedule and design data pipelines and workflows written in Golang.
-
-
If it’s REPL goplus had one https://github.com/goplus/igop
-
-
mab
Library for multi-armed bandit selection strategies, including efficient deterministic implementations of Thompson sampling and epsilon-greedy.
Here's an example of multi-armed bandits done in Go: https://github.com/stitchfix/mab
-
-
ordered-concurrently
Ordered-concurrently a library for concurrent processing with ordered output in Go. Process work concurrently and returns output in a channel in the order of input. It is useful in concurrently processing items in a queue, and get output in the order provided by the queue.
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
Go Data Science related posts
- Oxen.ai: Fast Unstructured Data Version Control
- 🐂 🌾 Oxen.ai - Blazing Fast Unstructured Data Version Control, built in Rust
- Deploying ML models straight from Jupyter Notebooks
- Using Ansible to create Deep Learning VM
- Just made my first ever (useful) Python program. Would it be better/easier to make it as a web app, or a GUI?
- Show HN: Reproducible development environments using starlark, without Nix
- Kubeflow, Jupyter notebook online. Question to community [D]
-
A note from our sponsor - SonarQube
www.sonarqube.org | 29 Mar 2023
Index
What are some of the best open-source Data Science projects in Go? This list will help you:
Project | Stars | |
---|---|---|
1 | excelize | 14,635 |
2 | gop | 8,419 |
3 | pachyderm | 5,867 |
4 | gophernotes | 3,516 |
5 | lgo | 2,284 |
6 | envd | 1,548 |
7 | featureform | 1,241 |
8 | dataframe-go | 962 |
9 | reflow | 923 |
10 | decimal | 454 |
11 | aqueduct | 409 |
12 | goro | 353 |
13 | qframe | 353 |
14 | terraform-provider-iterative | 275 |
15 | bacalhau | 256 |
16 | dud | 129 |
17 | Dataplane | 124 |
18 | beneath | 76 |
19 | igop | 74 |
20 | rtdl | 41 |
21 | mab | 33 |
22 | go-notebook | 33 |
23 | ordered-concurrently | 22 |