Recommendations for specializing in Spark (Scala)

This page summarizes the projects mentioned and recommended in the original post on /r/scala

Our great sponsors
  • InfluxDB - Power Real-Time Data Analytics at Scale
  • WorkOS - The modern identity platform for B2B SaaS
  • SaaSHub - Software Alternatives and Reviews
  • frameless

    Expressive types for Spark.

  • I recommend using Frameless, which includes a Cats module. In general, I would encourage you to master “purely” functional programming first, because it’s foundational. Spark is a very specific technology, and probably not even the best in that class today—I would be very careful about trying to build a career around it.

  • Lantern

  • Yeah. The point here is that the machine learning algorithms in libraries like TensorFlow and PyTorch ultimately rely on differentiating functions. The idea behind "differentiable programming" is to make the central mathematical aspects of machine learning more first- or at least second-class citizens. So at the library level you find "autodifferentiation" in Haskell, an implementation as part of Rainier in Scala, autodiff in Rust, etc. More ambitiously, you have the Lantern system providing autodifferentiation-as-metaprogramming in Scala (but generating C++), another metaprogramming approach in Scala, and autodifferentiation as a language feature in Swift.

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • ZparkIO

    Boiler plate framework to use Spark and ZIO together.

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts