-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
-
DataFrame
C++ DataFrame for statistical, Financial, and ML analysis -- in modern C++ using native types and contiguous memory storage
-
scientific-visualization-book
An open access book on scientific visualization using python and matplotlib
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
I don't mean to disparage pandas, which is a library that does a lot of things fairly well. But as an API for data manipulation I find it very verbose and it doesn't mesh with a "functional" way of thinking about applying transformations.
Generally, I've even preferred Spark to pandas, though it's hardly less verbose. Coming from R, it's much slower than data.table and nowhere near as slick and discoverable as dplyr. Its system of indices is a pain that I'd rather not deal with at all (and, indeed, I can't think of another data frame library that relies on them).
Handles time series really well, though.
Recently I've been using polars (https://github.com/pola-rs/polars). As an API I much, much prefer it to pandas, and it's a lot faster. Comes at the cost of not using numpy under the hood, so you can't just toss a polars data frame into a sklearn model.
There are scikit-learn (sklearn) API-compatible wrappers for e.g. PyTorch and TensorFlow.
Skorch: https://github.com/skorch-dev/skorch
tf.keras.wrappers.scikit_learn: https://www.tensorflow.org/api_docs/python/tf/keras/wrappers...
Just to clarify, scikit-learn 1.0 has not been released yet. The latest tag in the github repo is 1.0.rc2
https://github.com/scikit-learn/scikit-learn/releases/tag/1....
Of possible interest, a C++ replacement for Pandas:
https://github.com/hosseinmoein/DataFrame
There are Python ports of ggplot (e.g. plotnine (https://github.com/has2k1/plotnine)), but agreed, Python is behind here. I'm not the best at data viz, but I can usually piece together a way to make ggplot do what I want it to do without that much trouble or looking at documentation.
Matplotlib, though ... that's a harder beast to internalize. I know it's possible to make high-quality matplotlib plots, but it's much harder for me. Like pandas, it's a library that I don't want to denigrate because I know people put lots of effort into it, but I can't lie -- I'm not a fan.
Speaking of what's possible in matplotlib, I am very much looking forward to reading this book: https://github.com/rougier/scientific-visualization-book
> For me I had with pandas the most issues using it's multiindex.
Yessss. I loathe indices, and have never been in a situation where I was better off with them than without them.
> Regarding fast you have something like Vaex on python sid
I've never used Vaex, but I've used datatable (https://github.com/h2oai/datatable) and polars (https://github.com/pola-rs/polars). Polars is my favorite API, but datatable was faster at reading data (Polars was faster in execution). I'll have to give Vaex a try at some point.
Data.table is Faster to write and faster to perform
https://h2oai.github.io/db-benchmark/
Related posts
-
How to Build a Logistic Regression Model: A Spam-filter Tutorial
-
[D] Major bug in Scikit-Learn's implementation of F-1 score
-
Contraction Clustering (RASTER): A fast clustering algorithm
-
Transformers as Support Vector Machines
-
Scikit-learn Stock Prediction: using fundamental and pricing data to predict future stock returns. Sklearn's randomforest classifier is trainded and author claimed positive live trading results. Not actively mainained Other Models - star count:1520.0