LightGBM vs xgboost

LightGBM

A fast, distributed, high performance gradient boosting (GBT, GBDT, GBRT, GBM or MART) framework based on decision tree algorithms, used for ranking, classification and many other machine learning tasks. (by Microsoft)

Source Code

lightgbm.readthedocs.io

Suggest alternative

Edit details

xgboost

Scalable, Portable and Distributed Gradient Boosting (GBDT, GBRT or GBM) Library, for Python, R, Java, Scala, C++ and more. Runs on single machine, Hadoop, Spark, Dask, Flink and DataFlow (by dmlc)

Machine Learning Gbdt gbrt Gbm Distributed Systems Xgboost

Source Code

xgboost.readthedocs.io

Suggest alternative

Edit details

Our great sponsors

InfluxDB - Power Real-Time Data Analytics at Scale

WorkOS - The modern identity platform for B2B SaaS

SaaSHub - Software Alternatives and Reviews

Our great sponsors

LightGBM		xgboost
	Project
11	Mentions	10
16,043	Stars	25,576
1.0%	Growth	1.0%
9.2	Activity	9.6
7 days ago	Latest Commit	2 days ago
C++	Language	C++
MIT License	License	Apache License 2.0

The number of mentions indicates the total number of mentions that we've tracked plus the number of user suggested alternatives.
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.

LightGBM

Posts with mentions or reviews of LightGBM. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2023-06-29.

SIRUS.jl: Interpretable Machine Learning via Rule Extraction
2 projects | /r/Julia | 29 Jun 2023

SIRUS.jl is a pure Julia implementation of the SIRUS algorithm by Bénard et al. (2021). The algorithm is a rule-based machine learning model meaning that it is fully interpretable. The algorithm does this by firstly fitting a random forests and then converting this forest to rules. Furthermore, the algorithm is stable and achieves a predictive performance that is comparable to LightGBM, a state-of-the-art gradient boosting model created by Microsoft. Interpretability, stability, and predictive performance are described in more detail below.
[D] RAM speeds for tabular machine learning algorithms
1 project | /r/MachineLearning | 9 Jun 2023

Hey, thanks everybody for your answers. I've asked around in the XGBoost and LightGBM repos and some folks there also agreed that memory speed will be a bottleneck yes.
[P] LightGBM but lighter in another language?
1 project | /r/MachineLearning | 4 May 2023

LightBGM seems to have C API support, and C++ example in the main repo
Use whatever is best for the problem, but still
1 project | /r/datascience | 9 Aug 2022

LGBM doesn't do RF well, but it's easy to manually bag single LGBM trees.
What's New with AWS: Amazon SageMaker built-in algorithms now provides four new Tabular Data Modeling Algorithms
3 projects | dev.to | 28 Jun 2022

LightGBM is a popular and high-performance open-source implementation of the Gradient Boosting Decision Tree (GBDT). To learn how to use this algorithm, please see example notebooks for Classification and Regression.
Search YouTube from the terminal written in python
2 projects | /r/Python | 28 Feb 2022

Microsoft lightGBM. https://github.com/microsoft/LightGBM
LightGBM VS CXXGraph - a user suggested alternative
2 projects | 28 Feb 2022
Writing the fastest GBDT libary in Rust
6 projects | dev.to | 11 Jan 2022

Here are our benchmarks on training time comparing Tangram's Gradient Boosted Decision Tree Library to LightGBM, XGBoost, CatBoost, and sklearn.
Workstation Management With Nix Flakes: Build a Cmake C++ Package
2 projects | dev.to | 31 Oct 2021

{ inputs = { nixpkgs = { url = "github:nixos/nixpkgs/nixos-unstable"; }; flake-utils = { url = "github:numtide/flake-utils"; }; }; outputs = { nixpkgs, flake-utils, ... }: flake-utils.lib.eachDefaultSystem (system: let pkgs = import nixpkgs { inherit system; }; lightgbm-cli = (with pkgs; stdenv.mkDerivation { pname = "lightgbm-cli"; version = "3.3.1"; src = fetchgit { url = "https://github.com/microsoft/LightGBM"; rev = "v3.3.1"; sha256 = "pBrsey0RpxxvlwSKrOJEBQp7Hd9Yzr5w5OdUuyFpgF8="; fetchSubmodules = true; }; nativeBuildInputs = [ clang cmake ]; buildPhase = "make -j $NIX_BUILD_CORES"; installPhase = '' mkdir -p $out/bin mv $TMP/LightGBM/lightgbm $out/bin ''; } ); in rec { defaultApp = flake-utils.lib.mkApp { drv = defaultPackage; }; defaultPackage = lightgbm-cli; devShell = pkgs.mkShell { buildInputs = with pkgs; [ lightgbm-cli ]; }; } ); }
Is it possible to clean memory after using a package that has a memory leak in my python script?
2 projects | /r/Python | 29 Apr 2021

I'm working on the AutoML python package (Github repo). In my package, I'm using many different algorithms. One of the algorithms is LightGBM. The algorithm after the training doesn't release the memory, even if del is called and gc.collect() after. I created the issue on LightGBM GitHub -> link. Because of this leak, memory consumption is growing very fast during algorithm training.

xgboost

Posts with mentions or reviews of xgboost. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2023-05-09.

XGBoost 2.0
1 project | news.ycombinator.com | 13 Oct 2023
XGBoost2.0
1 project | news.ycombinator.com | 9 Oct 2023
Xgboost: Banding continuous variables vs keeping raw data
1 project | /r/datascience | 1 Jun 2023
PSA: You don't need fancy stuff to do good work.
10 projects | /r/datascience | 9 May 2023

Finally, when it comes to building models and making predictions, Python and R have a plethora of options available. Libraries like scikit-learn, statsmodels, and TensorFlowin Python, or caret, randomForest, and xgboostin R, provide powerful machine learning algorithms and statistical models that can be applied to a wide range of problems. What's more, these libraries are open-source and have extensive documentation and community support, making it easy to learn and apply new techniques without needing specialized training or expensive software licenses.
XGBoost Save and Load Error
1 project | /r/datascience | 14 Nov 2022

You can find the problem outlined here: https://github.com/dmlc/xgboost/issues/5826. u/hcho3 diagnosed the problem and corrected it as of XGB version 1.2.0.
For XGBoost (in Amazon SageMaker), one of the hyper parameters is num_round, for number of rounds to train. Does this mean cross validation?
1 project | /r/learnmachinelearning | 23 Sep 2022

Reference: https://github.com/dmlc/xgboost/issues/2031
CS Internship Questions
1 project | /r/stanford | 7 May 2022

By the way, most of the time XGBoost works just as well for projects, would not recommend applying deep learning to every single problem you come across, it's something Stanford CS really likes to showcase when it's well known (1) that sometimes "smaller"/less complex models can perform just as well or have their own interpretive advantages and (2) it is well known within ML and DS communities that deep learning does not perform as well with tabular datasets and using deep learning as a default to every problem is just poor practice. However, if you do (god forbid) get language, speech/audio, vision/imaging, or even time series models then deep learning as a baseline is not the worst idea.
OOM with ML Models (SKlearn, XGBoost, etc), workaround/tips for large datasets?
1 project | /r/MLQuestions | 1 Mar 2022
xgboost VS CXXGraph - a user suggested alternative
2 projects | 28 Feb 2022
'y contains previously unseen labels' (label encoder)
1 project | /r/pythonhelp | 9 Dec 2021

What are some alternatives?

When comparing LightGBM and xgboost you can also consider the following projects:

tensorflow - An Open Source Machine Learning Framework for Everyone

Prophet - Tool for producing high quality forecasts for time series data that has multiple seasonality with linear or non-linear growth.

H2O - H2O is an Open Source, Distributed, Fast & Scalable Machine Learning Platform: Deep Learning, Gradient Boosting (GBM) & XGBoost, Random Forest, Generalized Linear Modeling (GLM with Elastic Net), K-Means, PCA, Generalized Additive Models (GAM), RuleFit, Support Vector Machine (SVM), Stacked Ensembles, Automatic Machine Learning (AutoML), etc.

MLP Classifier - A handwritten multilayer perceptron classifer using numpy.

GPBoost - Combining tree-boosting with Gaussian process and mixed effects models

yggdrasil-decision-forests - A library to train, evaluate, interpret, and productionize decision forest models such as Random Forest and Gradient Boosted Decision Trees.

Keras - Deep Learning for humans

amazon-sagemaker-examples - Example 📓 Jupyter notebooks that demonstrate how to build, train, and deploy machine learning models using 🧠 Amazon SageMaker.

mlpack - mlpack: a fast, header-only C++ machine learning library

mljar-supervised - Python package for AutoML on Tabular Data with Feature Engineering, Hyper-Parameters Tuning, Explanations and Automatic Documentation

catboost - A fast, scalable, high performance Gradient Boosting on Decision Trees library, used for ranking, classification, regression and other machine learning tasks for Python, R, Java, C++. Supports computation on CPU and GPU.

LightGBM vs tensorflow xgboost vs Prophet LightGBM vs H2O xgboost vs MLP Classifier LightGBM vs GPBoost xgboost vs tensorflow LightGBM vs yggdrasil-decision-forests xgboost vs Keras LightGBM vs amazon-sagemaker-examples xgboost vs mlpack LightGBM vs mljar-supervised xgboost vs catboost

Compare LightGBM vs xgboost and see what are their differences.

LightGBM

xgboost

LightGBM

xgboost

What are some alternatives?