Home

Awesome

SageMaker fundamentals for R users


The SageMaker Fundamentals for R users workshop is for experienced R users with no prior Amazon SageMaker knowledge, who want to use their own (local) RStudio installation as an alternative to SageMaker Notebooks to connect to SageMaker to train, tune, evaluate, deploy and monitor machine learning models in the cloud.

We use a single project example throughout the workshop to explain the entire machine learning process end-to-end using SageMaker from a data science practitioner perspective. In addition, workshop attendees gain a solid understanding of the underlying SageMaker fundamentals, such as what happens behind the scenes when a training, tuning job or batch inference job is started. We show R recipes and best practices on how to parse and visualize the responses returned from Amazon SageMaker in the different stages of the machine learning process.

We use the reticulate package as an R interface to Python to make API calls to SageMaker using the SageMaker Python SDK.

Important: The workshop code is based on the new version 2.x of the SageMaker Python SDK.

Workshop modules

Each workshop module consists of an R Notobook (.Rmd file) so that users can run and experiment with the code examples using their RStudio installation. Workshop attendess should process the workshop modules in the given order. Each subsequent module re-uses pieces from the previous modules.

  1. Part 01: Configuring RStudio: Explains how to configure your RStudio environment as a “remote control” to connect to SageMaker.

  2. Part 02: Training a model with a built-in algorithm: You start by loading and pre-processing the project example data in RStudio locally before you upload the pre-processed data to S3. Then you launch a single SageMaker training job to train a model using the SageMaker XGBoost built-in algorithm. You use R built-in tools to evaluate the training results and use a SageMaker batch inference job on the test set for the final model evaluation. The module highlights all key objects involved (Estimator, Transformer) and describes how the infrastructure behind the built-in algorithms for training and batch inference jobs works.

  3. Part 03: Hyperparameter tuning: You learn how to use a hyperparameter tuning job instead of a single training job to train various models. The module highlights the key objects involved in tuning jobs in comparison to single training jobs (Estimator objects for single training jobs vs. Estimator & HyperparameterTuning objects for tuning jobs) and describes how the infrastructure for the tuning process works. We use built-in R tools to evaluate the tuning results and use a batch inference job on the test set for the final model evaluation.

  4. Part 04: Model deployment for real-time predictions: You will deploy a model as an HTTPS endpoint and make real-time predictions against it. You will learn the different steps of the SageMaker deployment process for deploying a single model that is based on a built-in algorithm behind an endpoint.

Prerequisites

AWS Cloud

Local installations

Workshop installation & start

What's next?

You have mastered the SageMaker fundamentas and like to learn more on how to leverage SageMaker as an R user? Please find below additional resources that will help you on your journey:

Do you know AWS's AI Services and that they allow you to add Deep Learning capabilities quickly to your R & Shiny applications without being a DL expert? The workshop below shows you how to use theses services from R:

Disclaimer