scikit-learn VS polars

Compare scikit-learn vs polars and see what are their differences.

scikit-learn

scikit-learn: machine learning in Python (by scikit-learn)

polars

Fast multi-threaded DataFrame library in Rust and Python (by ritchie46)
Our great sponsors
  • Scout APM - A developer's best friend. Try free for 14-days
  • Nanos - Run Linux Software Faster and Safer than Linux with Unikernels
  • SaaSHub - Software Alternatives and Reviews
scikit-learn polars
24 33
48,081 3,122
1.6% 18.5%
9.9 9.9
2 days ago 3 days ago
Python Rust
BSD 3-clause "New" or "Revised" License MIT License
The number of mentions indicates the total number of mentions that we've tracked plus the number of user suggested alternatives.
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.

scikit-learn

Posts with mentions or reviews of scikit-learn. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2021-11-13.

polars

Posts with mentions or reviews of polars. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2021-11-24.
  • Rust and what it needs to gain space in computation-oriented applications
    7 projects | reddit.com/r/rust | 24 Nov 2021
    You should check out polars, datafusion, influxdb iox and databend, all written in native Rust and powered by the Apache Arrow format. Polars in particular is pretty dam fast and has bindings for Python.
  • How to pass dataframes between Rust and Python?
    4 projects | reddit.com/r/rust | 20 Nov 2021
    A solution for either Polars or Datafusion (or something else?) would be fine. For both libraries, python packages exist, that contain the python bindings: https://github.com/pola-rs/polars/tree/master/py-polars https://github.com/apache/arrow-datafusion/tree/master/python
    4 projects | reddit.com/r/rust | 20 Nov 2021
    It's working for me now. :-) To get it working, I used the code from the suggested directory as well as the class PyPolarsError, which I copied into a local module: https://github.com/pola-rs/polars/tree/master/py-polars/src/arrow_interop https://github.com/pola-rs/polars/blob/629f5012bcefaa3c9a9c1a236e64dc057e8d472c/py-polars/src/error.rs Besides the polars dependency, also these are needed:
    4 projects | reddit.com/r/rust | 20 Nov 2021
  • Introducing tidypolars - a Python data frame package with syntax familiar to R tidyverse users
    4 projects | reddit.com/r/datascience | 10 Nov 2021
    The biggest difference with this one is that it's built on top of the polars package, which is probably the fastest data frame manipulation library out there. All of the other dplyr-to-python packages are build on top of pandas (which is very slow in comparison).
  • Introducing tidypolars - a Python data frame package for R tidyverse users
    9 projects | reddit.com/r/rstats | 10 Nov 2021
    tidypolars uses the polars package as a backend, which might be the fastest data frame manipulation library out there. (Faster even than R's data.table, which has been the king of speed for many years.)
    9 projects | reddit.com/r/rstats | 10 Nov 2021
  • Python and ETL
    3 projects | reddit.com/r/dataengineering | 5 Nov 2021
    Shameless plug. But I genuinely believe polars is the best tool for the job if performance, schema validity and RAM usage is important to you. Dependent on your machine Its performance is 2x-70x times pandas. It uses arrow memory and thus has proper null handling, has query optimization, a lot of parallelization, insanely fast csv-parser and utilizes much less RAM then pandas.
  • Show HN: Dataframes in Elixir Backed by Rust
    6 projects | news.ycombinator.com | 4 Nov 2021
  • Discussion: Integrating polars and plotters
    2 projects | reddit.com/r/rust | 26 Oct 2021
    For those who do not know about them, polars is a data frame crate for Rust and Python. It is also the fastest data frame library, according to benchmarks. Plotters is a crate for data visualisation. Both are the equivalents of pandas and matplotlib from the Python ecosystem. However, the integration with matplotlib in pandas has no equivalent. I would like to propose an effort to integrate polars with plotters, either by modifying the existing codebases, or creating a new bridge crate. I would love to hear opinions about this from the widder community.

What are some alternatives?

When comparing scikit-learn and polars you can also consider the following projects:

Keras - Deep Learning for humans

Surprise - A Python scikit for building and analyzing recommender systems

Prophet - Tool for producing high quality forecasts for time series data that has multiple seasonality with linear or non-linear growth.

tensorflow - An Open Source Machine Learning Framework for Everyone

gensim - Topic Modelling for Humans

PyBrain

TFLearn - Deep learning library featuring a higher-level API for TensorFlow.

MLflow - Open source platform for the machine learning lifecycle

seqeval - A Python framework for sequence labeling evaluation(named-entity recognition, pos tagging, etc...)

H2O - H2O is an Open Source, Distributed, Fast & Scalable Machine Learning Platform: Deep Learning, Gradient Boosting (GBM) & XGBoost, Random Forest, Generalized Linear Modeling (GLM with Elastic Net), K-Means, PCA, Generalized Additive Models (GAM), RuleFit, Support Vector Machine (SVM), Stacked Ensembles, Automatic Machine Learning (AutoML), etc.

xgboost - Scalable, Portable and Distributed Gradient Boosting (GBDT, GBRT or GBM) Library, for Python, R, Java, Scala, C++ and more. Runs on single machine, Hadoop, Spark, Dask, Flink and DataFlow

Apache Arrow - Apache Arrow is a multi-language toolbox for accelerated data interchange and in-memory processing