amazon-sagemaker-examples
LightGBM
Our great sponsors
amazon-sagemaker-examples | LightGBM | |
---|---|---|
17 | 11 | |
9,504 | 16,043 | |
2.0% | 1.0% | |
9.1 | 9.2 | |
about 22 hours ago | 6 days ago | |
Jupyter Notebook | C++ | |
Apache License 2.0 | MIT License |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
amazon-sagemaker-examples
-
Thesis Project Help Using SageMaker Free Tier
I need to use AWS Sagemaker (required, can't use easier services) and my adviser gave me this document to start with: https://github.com/aws/amazon-sagemaker-examples/blob/main/introduction_to_amazon_algorithms/jumpstart-foundation-models/question_answering_retrieval_augmented_generation/question_answering_langchain_jumpstart.ipynb
-
Sagemaker step scaling policy
I'm trying to define a step scaling policy for my sagemaker realtime endpoint, based on this example notebook. I understand that the step scaling policy defines thresholds to provision a different amount of instances, but I am confused because it doesn't seem to specify the metrics to track.
-
Working On My Own Generative AI App, Taking ~10 sec to generate image.
Yeah man: https://github.com/aws/amazon-sagemaker-examples/blob/main/introduction_to_amazon_algorithms/jumpstart_text_to_image/Amazon_JumpStart_Text_To_Image.ipynb
-
Study Plan to pass exam AWS Machine Learning Specialty exam with tips and advice
It's time to get your hands dirty by solving some ML Use Cases of your own from AWS SageMaker Use Cases repo.
-
Using AWS for Text Classification Part-1
Additionally, you can easily deploy pretrained fastText models on their own to live SageMaker endpoints to compute embedding vectors on the fly for use in relevant word-level tasks. See the following GitHub example for more details.
-
[D] How to monitor NLP and Object Detection models on AWS Sagemaker?
We are kind of boxed into using Sagemaker at our organization and we need to do a POC for Sagemaker's model monitoring. We noticed that Sagemaker monitoring works best with models that use tabular data/features. There are a lot of example notebooks that demonstrate model monitoring capabilities, but all of the examples are based on tabular data. We are trying to apply Sagemaker's model monitoring and gather metrics from Data Quality, Model Quality, Bias Drift, Feature Attribution Drift, and Explainability and then push those metrics into CloudWatch, similar to what was done in these notebooks: https://github.com/aws/amazon-sagemaker-examples/tree/main/sagemaker_model_monitor .
-
Migrate local Data Science workspaces to SageMaker Studio
Amazon SageMaker provides XGBoost as a built-in algorithm and data science team decided to use it and re-train the model. So, data scientists just need to call built-in version and provide path to data on S3, more detailed description can be found in documentation. Example notebook can be found here.
-
What's New with AWS: Amazon SageMaker built-in algorithms now provides four new Tabular Data Modeling Algorithms
Amazon SageMaker provides four new tabular data modeling algorithms: LightGBM, CatBoost, AutoGluon-Tabular and TabTransformer. These popular, state-of-the-art algorithms can be used for both tabular classification and regression tasks. They are available through the SageMaker JumpStart UI inside of SageMaker Studio, as well as through python code using SageMaker Python SDK. To learn how to use these algorithms, you can find SageMaker example notebooks below:
-
How InfoJobs (Adevinta) improves NLP model prediction performance with AWS Inferentia and Amazon SageMaker
In this section, we go through an example in which we show you how to compile a BERT model with Neo for AWS Inferentia. We then deploy that model to a SageMaker endpoint. You can find a sample notebook describing the whole process in detail on GitHub.
- amazon-sagemaker-examples: Example 📓 Jupyter notebooks that demonstrate how to build, train, and deploy machine learning models using 🧠Amazon SageMaker.
LightGBM
-
SIRUS.jl: Interpretable Machine Learning via Rule Extraction
SIRUS.jl is a pure Julia implementation of the SIRUS algorithm by Bénard et al. (2021). The algorithm is a rule-based machine learning model meaning that it is fully interpretable. The algorithm does this by firstly fitting a random forests and then converting this forest to rules. Furthermore, the algorithm is stable and achieves a predictive performance that is comparable to LightGBM, a state-of-the-art gradient boosting model created by Microsoft. Interpretability, stability, and predictive performance are described in more detail below.
-
[D] RAM speeds for tabular machine learning algorithms
Hey, thanks everybody for your answers. I've asked around in the XGBoost and LightGBM repos and some folks there also agreed that memory speed will be a bottleneck yes.
-
[P] LightGBM but lighter in another language?
LightBGM seems to have C API support, and C++ example in the main repo
-
Use whatever is best for the problem, but still
LGBM doesn't do RF well, but it's easy to manually bag single LGBM trees.
-
What's New with AWS: Amazon SageMaker built-in algorithms now provides four new Tabular Data Modeling Algorithms
LightGBM is a popular and high-performance open-source implementation of the Gradient Boosting Decision Tree (GBDT). To learn how to use this algorithm, please see example notebooks for Classification and Regression.
-
Search YouTube from the terminal written in python
Microsoft lightGBM. https://github.com/microsoft/LightGBM
-
LightGBM VS CXXGraph - a user suggested alternative
2 projects | 28 Feb 2022
-
Writing the fastest GBDT libary in Rust
Here are our benchmarks on training time comparing Tangram's Gradient Boosted Decision Tree Library to LightGBM, XGBoost, CatBoost, and sklearn.
-
Workstation Management With Nix Flakes: Build a Cmake C++ Package
{ inputs = { nixpkgs = { url = "github:nixos/nixpkgs/nixos-unstable"; }; flake-utils = { url = "github:numtide/flake-utils"; }; }; outputs = { nixpkgs, flake-utils, ... }: flake-utils.lib.eachDefaultSystem (system: let pkgs = import nixpkgs { inherit system; }; lightgbm-cli = (with pkgs; stdenv.mkDerivation { pname = "lightgbm-cli"; version = "3.3.1"; src = fetchgit { url = "https://github.com/microsoft/LightGBM"; rev = "v3.3.1"; sha256 = "pBrsey0RpxxvlwSKrOJEBQp7Hd9Yzr5w5OdUuyFpgF8="; fetchSubmodules = true; }; nativeBuildInputs = [ clang cmake ]; buildPhase = "make -j $NIX_BUILD_CORES"; installPhase = '' mkdir -p $out/bin mv $TMP/LightGBM/lightgbm $out/bin ''; } ); in rec { defaultApp = flake-utils.lib.mkApp { drv = defaultPackage; }; defaultPackage = lightgbm-cli; devShell = pkgs.mkShell { buildInputs = with pkgs; [ lightgbm-cli ]; }; } ); }
-
Is it possible to clean memory after using a package that has a memory leak in my python script?
I'm working on the AutoML python package (Github repo). In my package, I'm using many different algorithms. One of the algorithms is LightGBM. The algorithm after the training doesn't release the memory, even if del is called and gc.collect() after. I created the issue on LightGBM GitHub -> link. Because of this leak, memory consumption is growing very fast during algorithm training.
What are some alternatives?
aws-lambda-docker-serverless-inference - Serve scikit-learn, XGBoost, TensorFlow, and PyTorch models with AWS Lambda container images support.
tensorflow - An Open Source Machine Learning Framework for Everyone
catboost - A fast, scalable, high performance Gradient Boosting on Decision Trees library, used for ranking, classification, regression and other machine learning tasks for Python, R, Java, C++. Supports computation on CPU and GPU.
H2O - H2O is an Open Source, Distributed, Fast & Scalable Machine Learning Platform: Deep Learning, Gradient Boosting (GBM) & XGBoost, Random Forest, Generalized Linear Modeling (GLM with Elastic Net), K-Means, PCA, Generalized Additive Models (GAM), RuleFit, Support Vector Machine (SVM), Stacked Ensembles, Automatic Machine Learning (AutoML), etc.
sp-api-sdk - Amazon Selling Partner SPI - PHP SDKs
GPBoost - Combining tree-boosting with Gaussian process and mixed effects models
Popular-RL-Algorithms - PyTorch implementation of Soft Actor-Critic (SAC), Twin Delayed DDPG (TD3), Actor-Critic (AC/A2C), Proximal Policy Optimization (PPO), QT-Opt, PointNet..
yggdrasil-decision-forests - A library to train, evaluate, interpret, and productionize decision forest models such as Random Forest and Gradient Boosted Decision Trees.
sagemaker-studio-auto-shutdown-extension
mljar-supervised - Python package for AutoML on Tabular Data with Feature Engineering, Hyper-Parameters Tuning, Explanations and Automatic Documentation
Hello-AWS-Data-Services - AWS Data/MLServices sample code & notes for my LinkedIn Learning courses
xgboost - Scalable, Portable and Distributed Gradient Boosting (GBDT, GBRT or GBM) Library, for Python, R, Java, Scala, C++ and more. Runs on single machine, Hadoop, Spark, Dask, Flink and DataFlow