Data science in Scala

This page summarizes the projects mentioned and recommended in the original post on /r/scala

CodeRabbit: AI Code Reviews for Developers
Revolutionize your code reviews with AI. CodeRabbit offers PR summaries, code walkthroughs, 1-click suggestions, and AST-based analysis. Boost productivity and code quality across all major languages with each PR.
coderabbit.ai
featured
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com
featured
  1. Breeze

    Breeze is/was a numerical processing library for Scala.

    You can use https://github.com/scalanlp/breeze. A Scala library that's sorta a numpy/plotting equivalent. Unlike Spark which covers more use cases than just the classic Data Science workflow, Breeze is built specifically for "Data Science in Scala". The drawback is a classic one in Scala land where some major libraries abruptly get abandoned. Breeze's commits seem to have slowed down significantly and their website on their github page www.scalanlp.org is broken.

  2. CodeRabbit

    CodeRabbit: AI Code Reviews for Developers. Revolutionize your code reviews with AI. CodeRabbit offers PR summaries, code walkthroughs, 1-click suggestions, and AST-based analysis. Boost productivity and code quality across all major languages with each PR.

    CodeRabbit logo
  3. spark-nlp

    State of the Art Natural Language Processing

    I am not aware of common open frameworks like Tensorflow, PyTorch or Scikit-learn for Scala. But specifically for natural language processing, there's SparkNLP from John Snow Labs.

  4. SynapseML

    Simple and Distributed Machine Learning

    b) There are libraries around e.g. Microsoft SynapseML, LinkedIn Photon ML

  5. photon-ml

    A scalable machine learning library on Apache Spark

    b) There are libraries around e.g. Microsoft SynapseML, LinkedIn Photon ML

  6. saddle

    SADDLE: Scala Data Library (by pityka)

    You might be interested in the saddle library which is a dataframe manipulation library similar to python pandas.

  7. SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

    SaaSHub logo
NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts

  • Spark NLP 5.1.0: Introducing state-of-the-art OpenAI Whisper speech-to-text, OpenAI Embeddings and Completion transformers, MPNet text embeddings, ONNX support for E5 text embeddings, new multi-lingual BART Zero-Shot text classification, and much more!

    1 project | /r/Python | 6 Sep 2023
  • PySpark for NLP Workshop - Materials and Jupyter Notebooks

    2 projects | /r/dataengineering | 14 May 2023
  • Spark-NLP 4.4.0: New BART for Text Translation & Summarization, new ConvNeXT Transformer for Image Classification, new Zero-Shot Text Classification by BERT, more than 4000+ state-of-the-art models, and many more! · JohnSnowLabs/spark-nlp

    1 project | /r/apachespark | 11 Apr 2023
  • Spark-NLP 4.4.0: New BART for Text Translation & Summarization, new ConvNeXT Transformer for Image Classification, new Zero-Shot Text Classification by BERT, more than 4000+ state-of-the-art models, and many more! · JohnSnowLabs/spark-nlp

    1 project | /r/java | 11 Apr 2023
  • Spark-NLP 4.4.0: New BART for Text Translation & Summarization, new ConvNeXT Transformer for Image Classification, new Zero-Shot Text Classification by BERT, more than 4000+ state-of-the-art models, and many more! · JohnSnowLabs/spark-nlp

    1 project | /r/Python | 11 Apr 2023

Did you know that Scala is
the 38th most popular programming language
based on number of references?