Scikit-Learn Version 1.0

InfluxDB - Power Real-Time Data Analytics at Scale

Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

www.influxdata.com

featured

SaaSHub - Software Alternatives and Reviews

SaaSHub helps you find the best software and product alternatives

www.saashub.com

featured

polars

144 26,378 10.0 Rust

Dataframes powered by a multithreaded, vectorized query engine, written in Rust

I don't mean to disparage pandas, which is a library that does a lot of things fairly well. But as an API for data manipulation I find it very verbose and it doesn't mesh with a "functional" way of thinking about applying transformations.
Generally, I've even preferred Spark to pandas, though it's hardly less verbose. Coming from R, it's much slower than data.table and nowhere near as slick and discoverable as dplyr. Its system of indices is a pain that I'd rather not deal with at all (and, indeed, I can't think of another data frame library that relies on them).
Handles time series really well, though.
Recently I've been using polars (https://github.com/pola-rs/polars). As an API I much, much prefer it to pandas, and it's a lot faster. Comes at the cost of not using numpy under the hood, so you can't just toss a polars data frame into a sklearn model.

skorch

3 5,639 6.9 Jupyter Notebook

A scikit-learn compatible neural network library that wraps PyTorch

There are scikit-learn (sklearn) API-compatible wrappers for e.g. PyTorch and TensorFlow.
Skorch: https://github.com/skorch-dev/skorch
tf.keras.wrappers.scikit_learn: https://www.tensorflow.org/api_docs/python/tf/keras/wrappers...

InfluxDB

www.influxdata.com featured

Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
faiss

71 28,308 9.4 C++

A library for efficient similarity search and clustering of dense vectors.
scikit-learn

82 58,200 9.9 Python

scikit-learn: machine learning in Python

Just to clarify, scikit-learn 1.0 has not been released yet. The latest tag in the github repo is 1.0.rc2
https://github.com/scikit-learn/scikit-learn/releases/tag/1....

DataFrame

109 2,280 9.4 C++

C++ DataFrame for statistical, Financial, and ML analysis -- in modern C++ using native types and contiguous memory storage

Of possible interest, a C++ replacement for Pandas:
https://github.com/hosseinmoein/DataFrame

plotnine

36 3,835 9.6 Python

A Grammar of Graphics for Python

There are Python ports of ggplot (e.g. plotnine (https://github.com/has2k1/plotnine)), but agreed, Python is behind here. I'm not the best at data viz, but I can usually piece together a way to make ggplot do what I want it to do without that much trouble or looking at documentation.
Matplotlib, though ... that's a harder beast to internalize. I know it's possible to make high-quality matplotlib plots, but it's much harder for me. Like pandas, it's a library that I don't want to denigrate because I know people put lots of effort into it, but I can't lie -- I'm not a fan.

scientific-visualization-book

17 10,080 3.6 Python

An open access book on scientific visualization using python and matplotlib

Speaking of what's possible in matplotlib, I am very much looking forward to reading this book: https://github.com/rougier/scientific-visualization-book

SaaSHub

www.saashub.com featured

SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
datatable

9 1,790 6.1 C++

A Python package for manipulating 2-dimensional tabular data structures

> For me I had with pandas the most issues using it's multiindex.
Yessss. I loathe indices, and have never been in a situation where I was better off with them than without them.
> Regarding fast you have something like Vaex on python sid
I've never used Vaex, but I've used datatable (https://github.com/h2oai/datatable) and polars (https://github.com/pola-rs/polars). Polars is my favorite API, but datatable was faster at reading data (Polars was faster in execution). I'll have to give Vaex a try at some point.

db-benchmark

91 320 0.0 R

reproducible benchmark of database-like ops

Data.table is Faster to write and faster to perform
https://h2oai.github.io/db-benchmark/

sktime

8 7,430 9.8 Python

A unified framework for machine learning with time series

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

How to Build a Logistic Regression Model: A Spam-filter Tutorial

1 project | dev.to | 5 May 2024
[D] Major bug in Scikit-Learn's implementation of F-1 score

2 projects | /r/MachineLearning | 8 Dec 2023
Contraction Clustering (RASTER): A fast clustering algorithm

1 project | news.ycombinator.com | 27 Nov 2023
Transformers as Support Vector Machines

1 project | news.ycombinator.com | 3 Sep 2023
Scikit-learn Stock Prediction: using fundamental and pricing data to predict future stock returns. Sklearn's randomforest classifier is trainded and author claimed positive live trading results. Not actively mainained Other Models - star count:1520.0

1 project | /r/algoprojects | 28 Aug 2023

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com
Machine Learning Python Data Analysis scikit-learn Dataframe
Post date: 14 Sep 2021

polars

skorch

InfluxDB

faiss

scikit-learn

DataFrame

plotnine

scientific-visualization-book

SaaSHub

datatable

db-benchmark

sktime

Related posts

How to Build a Logistic Regression Model: A Spam-filter Tutorial

[D] Major bug in Scikit-Learn's implementation of F-1 score

Contraction Clustering (RASTER): A fast clustering algorithm

Transformers as Support Vector Machines

Scikit-learn Stock Prediction: using fundamental and pricing data to predict future stock returns. Sklearn's randomforest classifier is trainded and author claimed positive live trading results. Not actively mainained Other Models - star count:1520.0