Scikit-Learn Version 1.0

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com

InfluxDB - Power Real-Time Data Analytics at Scale
Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
www.influxdata.com
featured
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com
featured
  • polars

    Dataframes powered by a multithreaded, vectorized query engine, written in Rust

  • I don't mean to disparage pandas, which is a library that does a lot of things fairly well. But as an API for data manipulation I find it very verbose and it doesn't mesh with a "functional" way of thinking about applying transformations.

    Generally, I've even preferred Spark to pandas, though it's hardly less verbose. Coming from R, it's much slower than data.table and nowhere near as slick and discoverable as dplyr. Its system of indices is a pain that I'd rather not deal with at all (and, indeed, I can't think of another data frame library that relies on them).

    Handles time series really well, though.

    Recently I've been using polars (https://github.com/pola-rs/polars). As an API I much, much prefer it to pandas, and it's a lot faster. Comes at the cost of not using numpy under the hood, so you can't just toss a polars data frame into a sklearn model.

  • skorch

    A scikit-learn compatible neural network library that wraps PyTorch

  • There are scikit-learn (sklearn) API-compatible wrappers for e.g. PyTorch and TensorFlow.

    Skorch: https://github.com/skorch-dev/skorch

    tf.keras.wrappers.scikit_learn: https://www.tensorflow.org/api_docs/python/tf/keras/wrappers...

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • faiss

    A library for efficient similarity search and clustering of dense vectors.

  • scikit-learn

    scikit-learn: machine learning in Python

  • Just to clarify, scikit-learn 1.0 has not been released yet. The latest tag in the github repo is 1.0.rc2

    https://github.com/scikit-learn/scikit-learn/releases/tag/1....

  • DataFrame

    C++ DataFrame for statistical, Financial, and ML analysis -- in modern C++ using native types and contiguous memory storage

  • Of possible interest, a C++ replacement for Pandas:

    https://github.com/hosseinmoein/DataFrame

  • plotnine

    A Grammar of Graphics for Python

  • There are Python ports of ggplot (e.g. plotnine (https://github.com/has2k1/plotnine)), but agreed, Python is behind here. I'm not the best at data viz, but I can usually piece together a way to make ggplot do what I want it to do without that much trouble or looking at documentation.

    Matplotlib, though ... that's a harder beast to internalize. I know it's possible to make high-quality matplotlib plots, but it's much harder for me. Like pandas, it's a library that I don't want to denigrate because I know people put lots of effort into it, but I can't lie -- I'm not a fan.

  • scientific-visualization-book

    An open access book on scientific visualization using python and matplotlib

  • Speaking of what's possible in matplotlib, I am very much looking forward to reading this book: https://github.com/rougier/scientific-visualization-book

  • SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

    SaaSHub logo
  • datatable

    A Python package for manipulating 2-dimensional tabular data structures

  • > For me I had with pandas the most issues using it's multiindex.

    Yessss. I loathe indices, and have never been in a situation where I was better off with them than without them.

    > Regarding fast you have something like Vaex on python sid

    I've never used Vaex, but I've used datatable (https://github.com/h2oai/datatable) and polars (https://github.com/pola-rs/polars). Polars is my favorite API, but datatable was faster at reading data (Polars was faster in execution). I'll have to give Vaex a try at some point.

  • db-benchmark

    reproducible benchmark of database-like ops

  • Data.table is Faster to write and faster to perform

    https://h2oai.github.io/db-benchmark/

  • sktime

    A unified framework for machine learning with time series

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts

  • How to Build a Logistic Regression Model: A Spam-filter Tutorial

    1 project | dev.to | 5 May 2024
  • [D] Major bug in Scikit-Learn's implementation of F-1 score

    2 projects | /r/MachineLearning | 8 Dec 2023
  • Contraction Clustering (RASTER): A fast clustering algorithm

    1 project | news.ycombinator.com | 27 Nov 2023
  • Transformers as Support Vector Machines

    1 project | news.ycombinator.com | 3 Sep 2023
  • Scikit-learn Stock Prediction: using fundamental and pricing data to predict future stock returns. Sklearn's randomforest classifier is trainded and author claimed positive live trading results. Not actively mainained Other Models - star count:1520.0

    1 project | /r/algoprojects | 28 Aug 2023