Data Science toolset summary from 2021

This page summarizes the projects mentioned and recommended in the original post on dev.to

Our great sponsors
  • InfluxDB - Power Real-Time Data Analytics at Scale
  • WorkOS - The modern identity platform for B2B SaaS
  • SaaSHub - Software Alternatives and Reviews
  • Prophet

    Tool for producing high quality forecasts for time series data that has multiple seasonality with linear or non-linear growth.

  • Prophet - It is a time-series forecasting library built by Facebook. Prophet is a procedure for forecasting time series data based on an additive model where non-linear trends are fit with yearly, weekly, and daily seasonality, plus holiday effects. It works best with time series that have strong seasonal effects and several seasons of historical data. Prophet is robust to missing data and shifts in the trend, and typically handles outliers well. Link - https://github.com/facebook/prophet

  • examples

    TensorFlow examples (by tensorflow)

  • Tensorflow - It is mainly used for training ML models which are based on Neural networks and Deep Learning. TensorFlow was developed by the Google Brain team for internal Google use. It can be used in a wide variety of programming languages, most notably Python, as well as Javascript, C++, and Java. This flexibility lends itself to a range of applications in many different sectors. Link - https://www.tensorflow.org/

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • PostgreSQL

    Mirror of the official PostgreSQL GIT repository. Note that this is just a *mirror* - we don't work with pull requests on github. To contribute, please see https://wiki.postgresql.org/wiki/Submitting_a_Patch

  • MySQL

    MySQL Server, the world's most popular open source database, and MySQL Cluster, a real-time, open source transactional database.

  • MySQL - https://www.mysql.com/

  • MongoDB

    The MongoDB Database

  • MongoDB - https://www.mongodb.com/

  • scikit-learn

    scikit-learn: machine learning in Python

  • Scikit-learn - It is one of the most widely used frameworks for Python based Data science tasks. It features various classification, regression and clustering algorithms including support vector machines, random forests, gradient boosting, k-means and DBSCAN, and is designed to interoperate with the Python numerical and scientific libraries NumPy and SciPy. Link - https://scikit-learn.org/

  • Pytorch

    Tensors and Dynamic neural networks in Python with strong GPU acceleration

  • PyTorch - PyTorch is an open source machine learning library based on the Torch library, used for applications such as computer vision and natural language processing, primarily developed by Facebook's AI Research lab. It is free and open-source software released under the Modified BSD license. Link - https://pytorch.org/

  • WorkOS

    The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.

    WorkOS logo
  • MLflow

    Open source platform for the machine learning lifecycle

  • MLflow - https://mlflow.org/

  • Keras

    Deep Learning for humans

  • Keras - Keras is an open-source software library that provides a Python interface for artificial neural networks. Keras acts as an interface for the TensorFlow library. Link - https://keras.io/

  • huggingface_hub

    The official Python client for the Huggingface Hub.

  • Huggingface - It is open source library for building transformer based language models. It is used in the field of Natural Language Processing. Large language models like BERT, GPT, etc. are implemented using this library. Link - https://huggingface.co/

  • guildai

    Experiment tracking, ML developer tools

  • Guild.ai - https://guild.ai/

  • nodejs-bigquery

    Node.js client for Google Cloud BigQuery: A fast, economical and fully-managed enterprise data warehouse for large-scale data analytics.

  • Google Cloud BigQuery - https://cloud.google.com/bigquery

  • catboost

    A fast, scalable, high performance Gradient Boosting on Decision Trees library, used for ranking, classification, regression and other machine learning tasks for Python, R, Java, C++. Supports computation on CPU and GPU.

  • Catboost - CatBoost is an open-source software library developed by Yandex. It provides a gradient boosting framework which attempts to solve for Categorical features using a permutation driven alternative compared to the classical algorithm. Link - https://catboost.ai/

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts