amazon-sagemaker-examples
catboost
Our great sponsors
amazon-sagemaker-examples | catboost | |
---|---|---|
17 | 8 | |
9,504 | 7,744 | |
1.8% | 1.6% | |
9.1 | 9.9 | |
about 18 hours ago | 2 days ago | |
Jupyter Notebook | Python | |
Apache License 2.0 | Apache License 2.0 |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
amazon-sagemaker-examples
-
Thesis Project Help Using SageMaker Free Tier
I need to use AWS Sagemaker (required, can't use easier services) and my adviser gave me this document to start with: https://github.com/aws/amazon-sagemaker-examples/blob/main/introduction_to_amazon_algorithms/jumpstart-foundation-models/question_answering_retrieval_augmented_generation/question_answering_langchain_jumpstart.ipynb
-
Sagemaker step scaling policy
I'm trying to define a step scaling policy for my sagemaker realtime endpoint, based on this example notebook. I understand that the step scaling policy defines thresholds to provision a different amount of instances, but I am confused because it doesn't seem to specify the metrics to track.
-
Working On My Own Generative AI App, Taking ~10 sec to generate image.
Yeah man: https://github.com/aws/amazon-sagemaker-examples/blob/main/introduction_to_amazon_algorithms/jumpstart_text_to_image/Amazon_JumpStart_Text_To_Image.ipynb
-
Study Plan to pass exam AWS Machine Learning Specialty exam with tips and advice
It's time to get your hands dirty by solving some ML Use Cases of your own from AWS SageMaker Use Cases repo.
-
Using AWS for Text Classification Part-1
Additionally, you can easily deploy pretrained fastText models on their own to live SageMaker endpoints to compute embedding vectors on the fly for use in relevant word-level tasks. See the following GitHub example for more details.
-
[D] How to monitor NLP and Object Detection models on AWS Sagemaker?
We are kind of boxed into using Sagemaker at our organization and we need to do a POC for Sagemaker's model monitoring. We noticed that Sagemaker monitoring works best with models that use tabular data/features. There are a lot of example notebooks that demonstrate model monitoring capabilities, but all of the examples are based on tabular data. We are trying to apply Sagemaker's model monitoring and gather metrics from Data Quality, Model Quality, Bias Drift, Feature Attribution Drift, and Explainability and then push those metrics into CloudWatch, similar to what was done in these notebooks: https://github.com/aws/amazon-sagemaker-examples/tree/main/sagemaker_model_monitor .
-
Migrate local Data Science workspaces to SageMaker Studio
Amazon SageMaker provides XGBoost as a built-in algorithm and data science team decided to use it and re-train the model. So, data scientists just need to call built-in version and provide path to data on S3, more detailed description can be found in documentation. Example notebook can be found here.
-
What's New with AWS: Amazon SageMaker built-in algorithms now provides four new Tabular Data Modeling Algorithms
Amazon SageMaker provides four new tabular data modeling algorithms: LightGBM, CatBoost, AutoGluon-Tabular and TabTransformer. These popular, state-of-the-art algorithms can be used for both tabular classification and regression tasks. They are available through the SageMaker JumpStart UI inside of SageMaker Studio, as well as through python code using SageMaker Python SDK. To learn how to use these algorithms, you can find SageMaker example notebooks below:
-
How InfoJobs (Adevinta) improves NLP model prediction performance with AWS Inferentia and Amazon SageMaker
In this section, we go through an example in which we show you how to compile a BERT model with Neo for AWS Inferentia. We then deploy that model to a SageMaker endpoint. You can find a sample notebook describing the whole process in detail on GitHub.
- amazon-sagemaker-examples: Example 📓 Jupyter notebooks that demonstrate how to build, train, and deploy machine learning models using 🧠Amazon SageMaker.
catboost
- CatBoost: Open-source gradient boosting library
- Boosting Algorithms
-
What's New with AWS: Amazon SageMaker built-in algorithms now provides four new Tabular Data Modeling Algorithms
CatBoost is another popular and high-performance open-source implementation of the Gradient Boosting Decision Tree (GBDT). To learn how to use this algorithm, please see example notebooks for Classification and Regression.
-
Writing the fastest GBDT libary in Rust
Here are our benchmarks on training time comparing Tangram's Gradient Boosted Decision Tree Library to LightGBM, XGBoost, CatBoost, and sklearn.
-
Data Science toolset summary from 2021
Catboost - CatBoost is an open-source software library developed by Yandex. It provides a gradient boosting framework which attempts to solve for Categorical features using a permutation driven alternative compared to the classical algorithm. Link - https://catboost.ai/
-
CatBoost Quickstart — ML Classification
CatBoost is an open source algorithm based on gradient boosted decision trees. It supports numerical, categorical and text features. Check out the docs.
-
[D] What are your favorite Random Forest implementations that support categoricals
If you considering GBDT check out catboost, unfortunately RF mode is not available but library implement lots of interesting categorical encoding tricks that boost accuracy.
-
CatBoost and Water Pumps
The data contains a large number of categorical features. The most suitable for obtaining a base-line model, in my opinion, is CatBoost. It is a high-performance, open-source library for gradient boosting on decision trees.
What are some alternatives?
aws-lambda-docker-serverless-inference - Serve scikit-learn, XGBoost, TensorFlow, and PyTorch models with AWS Lambda container images support.
xgboost - Scalable, Portable and Distributed Gradient Boosting (GBDT, GBRT or GBM) Library, for Python, R, Java, Scala, C++ and more. Runs on single machine, Hadoop, Spark, Dask, Flink and DataFlow
LightGBM - A fast, distributed, high performance gradient boosting (GBT, GBDT, GBRT, GBM or MART) framework based on decision tree algorithms, used for ranking, classification and many other machine learning tasks.
Recommender - A C library for product recommendations/suggestions using collaborative filtering (CF)
sp-api-sdk - Amazon Selling Partner SPI - PHP SDKs
Keras - Deep Learning for humans
Popular-RL-Algorithms - PyTorch implementation of Soft Actor-Critic (SAC), Twin Delayed DDPG (TD3), Actor-Critic (AC/A2C), Proximal Policy Optimization (PPO), QT-Opt, PointNet..
Prophet - Tool for producing high quality forecasts for time series data that has multiple seasonality with linear or non-linear growth.
sagemaker-studio-auto-shutdown-extension
vowpal_wabbit - Vowpal Wabbit is a machine learning system which pushes the frontier of machine learning with techniques such as online, hashing, allreduce, reductions, learning2search, active, and interactive learning.
Hello-AWS-Data-Services - AWS Data/MLServices sample code & notes for my LinkedIn Learning courses
mxnet - Lightweight, Portable, Flexible Distributed/Mobile Deep Learning with Dynamic, Mutation-aware Dataflow Dep Scheduler; for Python, R, Julia, Scala, Go, Javascript and more