Our great sponsors
-
Neptune.ai, which promises to streamline your workflows and make collaboration a breeze.
-
That said I personally use Kubeflow hosted on a local baremetal kubernetes cluster (8 nodes, 4 gpus), but a lot of it is a bit of a bear to get installed correctly in a multi-machine environment (specifically this issue is still open and exposing the built-in dashboards outside of the cluster is a problem). Also because it's a Google product it's very clearly intended to run in the cloud with self-hosting being very much an afterthought
-
Sonar
Write Clean Python Code. Always.. Sonar helps you commit clean code every time. With over 225 unique rules to find Python bugs, code smells & vulnerabilities, Sonar finds the issues while you focus on the work.
-
If you're not concerned about self-hosting, WandB is one of the more fully featured training monitoring tools (I've used it in the past without any issues but the lack of data and training privacy and lack of self-hosting possibilities makes it a hard no for anything that isn't scholastic). Polyaxon is an alternative but rewriting all your variable logging to conform to their requirements makes it very difficult to switch to it in the middle of a project so you have to commit to it from the get-go.
-
I have an old labmate who uses a similar setup with MLFlow and can endorse it.
-
aim
Aim 💫 — An easy-to-use & supercharged open-source AI metadata tracker (experiment tracking, AI agents tracing)
Check out Aim: https://github.com/aimhubio/aim
-
InfluxDB
Access the most powerful time series database as a service. Ingest, store, & analyze all types of time series data in a fully-managed, purpose-built database. Keep data forever with low-cost storage and superior data compression.