Data science in Scala

This page summarizes the projects mentioned and recommended in the original post on /r/scala

Our great sponsors
  • WorkOS - The modern identity platform for B2B SaaS
  • InfluxDB - Power Real-Time Data Analytics at Scale
  • SaaSHub - Software Alternatives and Reviews
  • Breeze

    Breeze is a numerical processing library for Scala.

  • You can use https://github.com/scalanlp/breeze. A Scala library that's sorta a numpy/plotting equivalent. Unlike Spark which covers more use cases than just the classic Data Science workflow, Breeze is built specifically for "Data Science in Scala". The drawback is a classic one in Scala land where some major libraries abruptly get abandoned. Breeze's commits seem to have slowed down significantly and their website on their github page www.scalanlp.org is broken.

  • spark-nlp

    State of the Art Natural Language Processing

  • I am not aware of common open frameworks like Tensorflow, PyTorch or Scikit-learn for Scala. But specifically for natural language processing, there's SparkNLP from John Snow Labs.

  • WorkOS

    The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.

    WorkOS logo
  • SynapseML

    Simple and Distributed Machine Learning

  • b) There are libraries around e.g. Microsoft SynapseML, LinkedIn Photon ML

  • photon-ml

    A scalable machine learning library on Apache Spark

  • b) There are libraries around e.g. Microsoft SynapseML, LinkedIn Photon ML

  • saddle

    SADDLE: Scala Data Library (by pityka)

  • You might be interested in the saddle library which is a dataframe manipulation library similar to python pandas.

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts