Data science in Scala

This page summarizes the projects mentioned and recommended in the original post on /r/scala

Our great sponsors
  • InfluxDB - Power Real-Time Data Analytics at Scale
  • Onboard AI - ChatGPT with full context of any GitHub repo.
  • WorkOS - The modern API for authentication & user identity.
  • Breeze

    Breeze is a numerical processing library for Scala.

    You can use https://github.com/scalanlp/breeze. A Scala library that's sorta a numpy/plotting equivalent. Unlike Spark which covers more use cases than just the classic Data Science workflow, Breeze is built specifically for "Data Science in Scala". The drawback is a classic one in Scala land where some major libraries abruptly get abandoned. Breeze's commits seem to have slowed down significantly and their website on their github page www.scalanlp.org is broken.

  • spark-nlp

    State of the Art Natural Language Processing

    I am not aware of common open frameworks like Tensorflow, PyTorch or Scikit-learn for Scala. But specifically for natural language processing, there's SparkNLP from John Snow Labs.

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

  • SynapseML

    Simple and Distributed Machine Learning

    b) There are libraries around e.g. Microsoft SynapseML, LinkedIn Photon ML

  • photon-ml

    A scalable machine learning library on Apache Spark

    b) There are libraries around e.g. Microsoft SynapseML, LinkedIn Photon ML

  • saddle

    SADDLE: Scala Data Library (by pityka)

    You might be interested in the saddle library which is a dataframe manipulation library similar to python pandas.

  • Onboard AI

    ChatGPT with full context of any GitHub repo. Onboard AI learns any GitHub repo in minutes and lets you chat with it to locate functionality, understand different parts, and generate new code. Use it for free at app.getonboardai.com.

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts