Show HN: SpotML – Managed ML Training on Cheap AWS/GCP Spot Instances

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com

InfluxDB - Power Real-Time Data Analytics at Scale
Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
www.influxdata.com
featured
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com
featured
  • nimbo

    Discontinued Run compute jobs on AWS as if you were running them locally.

  • Seems like Nimbo (https://nimbo.sh) has a Business Source License (https://github.com/nimbo-sh/nimbo/blob/master/LICENSE), so you might want to check with them regarding licensing terms for a startup that is using their code and/or docs in "production"?

    Otherwise, this idea is interesting and probably generalizable to other applications. Maybe it's not crystal clear to me, but what are the advantages of your service over existing solutions such as Nimbo and Spotty? FWIW it might be worthwhile adding this to your website.

    Good luck!

  • Ray

    Ray is a unified framework for scaling AI and Python applications. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.

  • Neat. Congratulations on the launch!

    Apart from the fact that it could deploy to both GCP and AWS, what does it do differently than AWS Batch [0]?

    When we had a similar problem, we ran jobs on spots with AWS Batch and it worked nicely enough.

    Some suggestions (for a later date):

    1. Add built-in support for Ray [1] (you'd essentially be then competing with Anyscale, which is a VC funded startup, just to contrast it with another comment on this thread) and dbt [2].

    2. Support deploying coin miners (might be good to widen the product's reach; and stand it up against the likes of consensys).

    3. Get in front of many cost optimisation consultants out there, like the Duckbill Group.

    If I may, where are you building this product from? And how many are on the team?

    Thanks.

    [0] https://aws.amazon.com/batch/use-cases/

    [1] https://ray.io/

    [2] https://getdbt.com/

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • dbt-spark

    dbt-spark contains all of the code enabling dbt to work with Apache Spark and Databricks

  • Neat. Congratulations on the launch!

    Apart from the fact that it could deploy to both GCP and AWS, what does it do differently than AWS Batch [0]?

    When we had a similar problem, we ran jobs on spots with AWS Batch and it worked nicely enough.

    Some suggestions (for a later date):

    1. Add built-in support for Ray [1] (you'd essentially be then competing with Anyscale, which is a VC funded startup, just to contrast it with another comment on this thread) and dbt [2].

    2. Support deploying coin miners (might be good to widen the product's reach; and stand it up against the likes of consensys).

    3. Get in front of many cost optimisation consultants out there, like the Duckbill Group.

    If I may, where are you building this product from? And how many are on the team?

    Thanks.

    [0] https://aws.amazon.com/batch/use-cases/

    [1] https://ray.io/

    [2] https://getdbt.com/

  • criu-image-streamer

    Enables streaming of images to and from CRIU during checkpoint/restore with low overhead

  • Cool yeah that makes sense, makes total sense for ML where you just need to run over epochs, less clear for other workloads.

    After looking around I thinking more about CRIU/docker suspend. The google stars aligned and I found this https://github.com/checkpoint-restore/criu-image-streamer + https://linuxplumbersconf.org/event/7/contributions/641/atta... which actually seems perfect. I wonder how fast it is

    (or, hacking on a checkpoint idea, have a daemon periodically 'checkpoint' other programs so even if it's too slow over 60 seconds, revert to the last checkpoint. Even an rsync like application where only send the changes)

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts