Machine learning with Julia - Solve Titanic competition on Kaggle and deploy trained AI model as a web service

This page summarizes the projects mentioned and recommended in the original post on dev.to

Our great sponsors
  • InfluxDB - Power Real-Time Data Analytics at Scale
  • WorkOS - The modern identity platform for B2B SaaS
  • SaaSHub - Software Alternatives and Reviews
  • ScikitLearn.jl

    Julia implementation of the scikit-learn API https://cstjean.github.io/ScikitLearn.jl/dev/

  • For machine learning, we will use SciKitLearn.jl library, which replicates SciKit-Learn library for Python. It provides an interface for commonly used machine learning models like Logistic Regression, Decission Tree or Random Forest. SciKitLearn.jl is not a single package but a rich ecosystem with many packages, and you need to select which of them to install and import. You can find a list of supported models here. Some of them are built-in Julia models, others are imported from Python. Also, the SciKitLearn.jl has a lot of tools to tune the learning process and evaluate results.

  • JLD2.jl

    HDF5-compatible file format in pure Julia

  • First, you need to save the model from the notebook to a file. For this you can use JLD2.jl module. This module used to serialize Julia object to HDF5-compatible format (which is well known by Python data scientists) and save it to a file.

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • HTTP.jl

    HTTP for Julia

  • The req.url field contains the URL of the received request, the req.method field contains request method, like GET or POST, the req.body field contains the POST body of the request in binary format. HTTP request object contains much other information. All this you can find in HTTP.jl documentation. Our web application will only check the request method. If the received request is a POST request, it will parse req.body to JSON object and send the data from this object to the isSurvived function to make a prediction and return it to the client browser. For all other request types, it will just return the content of the index.html file, to display the web interface. This is how the whole source of titanic.jl web service looks:

  • DataScience

    Data Science in Julia course for JuliaAcademy.com, taught by Huda Nassar (by JuliaAcademy)

  • For all topics that explained briefly, I provided the links with more thorough documentation. In addition, I would highly recommend reading the Julia Data Science online book and learn the great set of machine learning examples in Julia Academy Data Science GitHub repository.

  • julia_titanic_model

    Titanic machine learning model and web service

  • seaborn

    Statistical data visualization in Python

  • Using Plots.jl, you can create a lot of different graphs to analyze your data, similar to Matplotlib or Seaborn in Python. To use it, you have to install the Plots package to your notebook and import it:

  • scikit-learn

    scikit-learn: machine learning in Python

  • This is not a book, but only an article. That is why it can't cover everything and assumes that you already have some base knowledge to get the most from reading it. It is essential that you are familiar with Python machine learning and understand how to train machine learning models using Numpy, Pandas, SciKit-Learn and Matplotlib Python libraries. Also, I assume that you are familiar with machine learning theory: types of machine learning problems like regression and classification, the concept and process of Supervised machine learning (fit/predict and evaluate quality using metrics) and common models used for it, including Random Forest Classifier, and it's implementation in SciKit-Learn Python library. Additionally, it would be great if you previously participated in Kaggle competitions, because to understand and run all code of this article you need to have an account on https://kaggle.com.

  • WorkOS

    The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.

    WorkOS logo
  • Pandas

    Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, statistical functions, and much more

  • This is not a book, but only an article. That is why it can't cover everything and assumes that you already have some base knowledge to get the most from reading it. It is essential that you are familiar with Python machine learning and understand how to train machine learning models using Numpy, Pandas, SciKit-Learn and Matplotlib Python libraries. Also, I assume that you are familiar with machine learning theory: types of machine learning problems like regression and classification, the concept and process of Supervised machine learning (fit/predict and evaluate quality using metrics) and common models used for it, including Random Forest Classifier, and it's implementation in SciKit-Learn Python library. Additionally, it would be great if you previously participated in Kaggle competitions, because to understand and run all code of this article you need to have an account on https://kaggle.com.

  • NumPy

    The fundamental package for scientific computing with Python.

  • This is not a book, but only an article. That is why it can't cover everything and assumes that you already have some base knowledge to get the most from reading it. It is essential that you are familiar with Python machine learning and understand how to train machine learning models using Numpy, Pandas, SciKit-Learn and Matplotlib Python libraries. Also, I assume that you are familiar with machine learning theory: types of machine learning problems like regression and classification, the concept and process of Supervised machine learning (fit/predict and evaluate quality using metrics) and common models used for it, including Random Forest Classifier, and it's implementation in SciKit-Learn Python library. Additionally, it would be great if you previously participated in Kaggle competitions, because to understand and run all code of this article you need to have an account on https://kaggle.com.

  • cheatsheets

    Official Matplotlib cheat sheets (by matplotlib)

  • This is not a book, but only an article. That is why it can't cover everything and assumes that you already have some base knowledge to get the most from reading it. It is essential that you are familiar with Python machine learning and understand how to train machine learning models using Numpy, Pandas, SciKit-Learn and Matplotlib Python libraries. Also, I assume that you are familiar with machine learning theory: types of machine learning problems like regression and classification, the concept and process of Supervised machine learning (fit/predict and evaluate quality using metrics) and common models used for it, including Random Forest Classifier, and it's implementation in SciKit-Learn Python library. Additionally, it would be great if you previously participated in Kaggle competitions, because to understand and run all code of this article you need to have an account on https://kaggle.com.

  • julia

    The Julia Programming Language

  • Julia is a general purpose programming language well suited for numerical analysis and computational science. Sometimes it's stated as a future of machine learning and the most natural replacement for Python in this field.

  • PlotDocs.jl

    Documentation for Plots.jl

  • Using Plots.jl, you can create a lot of different graphs to analyze your data, similar to Matplotlib or Seaborn in Python. To use it, you have to install the Plots package to your notebook and import it:

  • DataFrames.jl

    In-memory tabular data in Julia

  • It were just a few percents of all possible manipulations that you can do with data using DataFrames.jl library. Read more about it in the documentation.

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts