mljar-supervised
lleaves
Our great sponsors
mljar-supervised | lleaves | |
---|---|---|
51 | 4 | |
2,929 | 292 | |
1.2% | - | |
8.5 | 7.0 | |
11 days ago | 24 days ago | |
Python | Python | |
MIT License | MIT License |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
mljar-supervised
-
Show HN: Web App with GUI for AutoML on Tabular Data
Web App is using two open-source packages that I've created:
- MLJAR AutoML - Python package for AutoML on tabular data https://github.com/mljar/mljar-supervised
- Mercury - framework for converting Jupyter Notebooks into Web App https://github.com/mljar/mercury
You can run Web App locally. What is more, you can adjust notebook's code for your needs. For example, you can set different validation strategies or evalutaion metrics or longer training times. The notebooks in the repo are good starting point for you to develop more advanced apps.
-
Fairness in machine learning
It's an Automated Machine Learning python package. It's open-source, you can see how it works on GitHub: https://github.com/mljar/mljar-supervised
-
[P] Build data web apps in Jupyter Notebook with Python only
Sure, at the bottom of our website you can subscribe for newsletter.
- Show HN: AutoML Python Package for Tabular Data with Automatic Documentation
-
library / framework to test multiple sklearn regression models at once
If you need a simple and fast solution, go with auto-sklearn Maybe a bit more complex, but very powerful was mljar-supervised
- Python AutoML on Tabular Data with FeatureEng, HP Tuning, Explanations, AutoDoc
-
Data Science and full-stack-web development
In my case, I had experience in DS and software engineering. It gives me ability to start a company that works on Data Science tools.
-
Learning Python tricks by reading other people's code. But who?
MLJAR AutoML is a Python package for Automated Machine Learning on tabular data with feature engineering, explanations, and automatic documentation.
-
'start with a simple model'
I recommend trying my AutoML package. You can easily check many different algorithms. Waht is more, the baseline algorithms are checked (major class predictor for classification and mean predictor for regression). The advance of AutoML is that it is really quick. You dont need to write preprocessing code, just call fit method.
lleaves
- LLeaves: A LLVM-based compiler for LightGBM decision trees
-
Cold Showers
I built this decision tree (LightGBM) compiler last summer: https://github.com/siboehm/lleaves
It get's you ~10x speedups for batch predictions, more if your model is big. It's not complicated, it ended up being <1K lines of Python code. I heard a couple of stories like yours, where people had multi-node spark clusters running LightGBM, and it always amused me because by if you compiled the trees instead you could get rid of the whole cluster.
-
Tree compiler that speeds up LightGBM model inference by ~30x
In a near-future version I'll expose some of the compilation parameters, I was somewhat afraid of having an API that's too complicated deterring people who just want a no-fuzz drop-in replacement for LightGBM. But as long as I keep sane defaults and have the parameters optional it should be fine. Relevant parameters are definitely block size (needs to adjust to L1i size and tree size) as well as the LLVM codemodel (a smaller adress space increases single-batch prediction speeds but doesn't work for large models). The thread-size specific compilation I'm still looking into, it makes the API more complicated and so might not be worth it.
What are some alternatives?
optuna - A hyperparameter optimization framework
ngboost - Natural Gradient Boosting for Probabilistic Prediction
autokeras - AutoML library for deep learning
m2cgen - Transform ML models into a native code (Java, C, Python, Go, JavaScript, Visual Basic, C#, R, PowerShell, PHP, Dart, Haskell, Ruby, F#, Rust) with zero dependencies
LightGBM - A fast, distributed, high performance gradient boosting (GBT, GBDT, GBRT, GBM or MART) framework based on decision tree algorithms, used for ranking, classification and many other machine learning tasks.
miceforest - Multiple Imputation with LightGBM in Python
PySR - High-Performance Symbolic Regression in Python and Julia
catboost - A fast, scalable, high performance Gradient Boosting on Decision Trees library, used for ranking, classification, regression and other machine learning tasks for Python, R, Java, C++. Supports computation on CPU and GPU.
AutoViz - Automatically Visualize any dataset, any size with a single line of code. Created by Ram Seshadri. Collaborators Welcome. Permission Granted upon Request.
mljar-examples - Examples how MLJAR can be used
Auto_ViML - Automatically Build Multiple ML Models with a Single Line of Code. Created by Ram Seshadri. Collaborators Welcome. Permission Granted upon Request.
studio - MLJAR Studio Desktop Application