Machine learning with Julia - Solve Titanic competition on Kaggle and deploy trained AI model as a web service

Our great sponsors

InfluxDB - Power Real-Time Data Analytics at Scale

WorkOS - The modern identity platform for B2B SaaS

SaaSHub - Software Alternatives and Reviews

Our great sponsors

ScikitLearn.jl

4 537 3.9 Julia

Julia implementation of the scikit-learn API https://cstjean.github.io/ScikitLearn.jl/dev/

For machine learning, we will use SciKitLearn.jl library, which replicates SciKit-Learn library for Python. It provides an interface for commonly used machine learning models like Logistic Regression, Decission Tree or Random Forest. SciKitLearn.jl is not a single package but a rich ecosystem with many packages, and you need to select which of them to install and import. You can find a list of supported models here. Some of them are built-in Julia models, others are imported from Python. Also, the SciKitLearn.jl has a lot of tools to tune the learning process and evaluate results.

JLD2.jl

2 521 8.1 Julia

HDF5-compatible file format in pure Julia

First, you need to save the model from the notebook to a file. For this you can use JLD2.jl module. This module used to serialize Julia object to HDF5-compatible format (which is well known by Python data scientists) and save it to a file.

InfluxDB

www.influxdata.com sponsored

Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
HTTP.jl

7 623 7.7 Julia

HTTP for Julia

The req.url field contains the URL of the received request, the req.method field contains request method, like GET or POST, the req.body field contains the POST body of the request in binary format. HTTP request object contains much other information. All this you can find in HTTP.jl documentation. Our web application will only check the request method. If the received request is a POST request, it will parse req.body to JSON object and send the data from this object to the isSurvived function to make a prediction and return it to the client browser. For all other request types, it will just return the content of the index.html file, to display the web interface. This is how the whole source of titanic.jl web service looks:

DataScience

9 478 0.0 Jupyter Notebook

Data Science in Julia course for JuliaAcademy.com, taught by Huda Nassar (by JuliaAcademy)

For all topics that explained briefly, I provided the links with more thorough documentation. In addition, I would highly recommend reading the Julia Data Science online book and learn the great set of machine learning examples in Julia Academy Data Science GitHub repository.

julia_titanic_model

1 3 0.0 Jupyter Notebook

Titanic machine learning model and web service
seaborn

76 11,946 8.5 Python

Statistical data visualization in Python

Using Plots.jl, you can create a lot of different graphs to analyze your data, similar to Matplotlib or Seaborn in Python. To use it, you have to install the Plots package to your notebook and import it:

scikit-learn

81 58,046 9.9 Python

scikit-learn: machine learning in Python

This is not a book, but only an article. That is why it can't cover everything and assumes that you already have some base knowledge to get the most from reading it. It is essential that you are familiar with Python machine learning and understand how to train machine learning models using Numpy, Pandas, SciKit-Learn and Matplotlib Python libraries. Also, I assume that you are familiar with machine learning theory: types of machine learning problems like regression and classification, the concept and process of Supervised machine learning (fit/predict and evaluate quality using metrics) and common models used for it, including Random Forest Classifier, and it's implementation in SciKit-Learn Python library. Additionally, it would be great if you previously participated in Kaggle competitions, because to understand and run all code of this article you need to have an account on https://kaggle.com.

WorkOS

workos.com sponsored

The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
Pandas

393 41,923 10.0 Python

Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, statistical functions, and much more

This is not a book, but only an article. That is why it can't cover everything and assumes that you already have some base knowledge to get the most from reading it. It is essential that you are familiar with Python machine learning and understand how to train machine learning models using Numpy, Pandas, SciKit-Learn and Matplotlib Python libraries. Also, I assume that you are familiar with machine learning theory: types of machine learning problems like regression and classification, the concept and process of Supervised machine learning (fit/predict and evaluate quality using metrics) and common models used for it, including Random Forest Classifier, and it's implementation in SciKit-Learn Python library. Additionally, it would be great if you previously participated in Kaggle competitions, because to understand and run all code of this article you need to have an account on https://kaggle.com.

NumPy

272 26,290 10.0 Python

The fundamental package for scientific computing with Python.

This is not a book, but only an article. That is why it can't cover everything and assumes that you already have some base knowledge to get the most from reading it. It is essential that you are familiar with Python machine learning and understand how to train machine learning models using Numpy, Pandas, SciKit-Learn and Matplotlib Python libraries. Also, I assume that you are familiar with machine learning theory: types of machine learning problems like regression and classification, the concept and process of Supervised machine learning (fit/predict and evaluate quality using metrics) and common models used for it, including Random Forest Classifier, and it's implementation in SciKit-Learn Python library. Additionally, it would be great if you previously participated in Kaggle competitions, because to understand and run all code of this article you need to have an account on https://kaggle.com.

cheatsheets

126 7,235 7.1 Python

Official Matplotlib cheat sheets (by matplotlib)

This is not a book, but only an article. That is why it can't cover everything and assumes that you already have some base knowledge to get the most from reading it. It is essential that you are familiar with Python machine learning and understand how to train machine learning models using Numpy, Pandas, SciKit-Learn and Matplotlib Python libraries. Also, I assume that you are familiar with machine learning theory: types of machine learning problems like regression and classification, the concept and process of Supervised machine learning (fit/predict and evaluate quality using metrics) and common models used for it, including Random Forest Classifier, and it's implementation in SciKit-Learn Python library. Additionally, it would be great if you previously participated in Kaggle competitions, because to understand and run all code of this article you need to have an account on https://kaggle.com.

julia

350 44,469 10.0 Julia

The Julia Programming Language

Julia is a general purpose programming language well suited for numerical analysis and computational science. Sometimes it's stated as a future of machine learning and the most natural replacement for Python in this field.

PlotDocs.jl

3 92 2.4

Documentation for Plots.jl

Using Plots.jl, you can create a lot of different graphs to analyze your data, similar to Matplotlib or Seaborn in Python. To use it, you have to install the Plots package to your notebook and import it:

DataFrames.jl

9 1,690 7.0 Julia

In-memory tabular data in Julia

It were just a few percents of all possible manipulations that you can do with data using DataFrames.jl library. Read more about it in the documentation.

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

How to Build and Deploy a Machine Learning model using Docker
5 projects | dev.to | 30 Jul 2023
PSA: You don't need fancy stuff to do good work.
10 projects | /r/datascience | 9 May 2023
How to query pandas DataFrames with SQL
5 projects | dev.to | 1 Feb 2023
Talking Data: What do we need for engaging data analytics?
4 projects | dev.to | 6 Oct 2022
Should you learn Julia or Python for Machine Learning?
8 projects | /r/learnmachinelearning | 15 Aug 2021

Machine learning with Julia - Solve Titanic competition on Kaggle and deploy trained AI model as a web service

This page summarizes the projects mentioned and recommended in the original post on dev.to
Julia Python Data Science Machine Learning Science and Data analysis
Post date: 17 Feb 2023

ScikitLearn.jl

JLD2.jl

InfluxDB

HTTP.jl

DataScience

julia_titanic_model

seaborn

scikit-learn

WorkOS

Pandas

NumPy

cheatsheets

julia

PlotDocs.jl

DataFrames.jl

Related posts

Machine learning with Julia - Solve Titanic competition on Kaggle and deploy trained AI model as a web service

This page summarizes the projects mentioned and recommended in the original post on dev.to Julia Python Data Science Machine Learning Science and Data analysis Post date: 17 Feb 2023

Related posts

This page summarizes the projects mentioned and recommended in the original post on dev.to
Julia Python Data Science Machine Learning Science and Data analysis
Post date: 17 Feb 2023