Run Machine Learning Training with Managed Spot Training for Amazon SageMaker

Lab Details

  1. This lab walks you through the steps to create a Sagemaker notebook instance. You will use JupyterLab for running the training job and find the difference.

  2. Duration: 90 minutes

  3. AWS Region: US East (N. Virginia) us-east-1


What is Amazon SageMaker

  • It is a fully managed machine learning service.

  • Amazon SageMaker helps developers and data scientists to build and train machine learning (ML) models so they can be deployed quickly on a production-ready hosted environment.

  • It removes the heavy lifting from steps of the ML process to make it easy for developing high-quality models.

  • Since it is a fully managed service, there are no maintenance windows or scheduled downtimes.

  • It stores the code in ML storage volumes which are secured by security groups and encrypted at rest.

  • It does not use or share customer models with other customers' training data and algorithms.  

What is EC2 Spot Instance

  • Spot Instances are an unused part of Amazon EC2, using which you can save up to 90% on cost as compared to On-Demand cost, but AWS can interrupt your spot instances if the Current Price is higher than the Maximum Price you specified.

  • Spot uses the same EC2 instances (AMI and instance type) what On-Demand and Reserved Instances use. It is the best to fit for use cases where data is reproducible and can sustain the interruption at any point in time.

  • You can use Spot Instance as additional compute capacity to your On-Demand or Reserved Instances, where fault-tolerant is acceptable.

  • EC2 Spot Instances can be launched the same way you launch EC2 Instance, like using Spot Fleet. Auto Scaling Groups or AWS Management Console.

  • If AWS terminates or stops your Amazon EC2 Spot Instance within an hour then you will not be charged.

  • However, if you choose to stop or terminate your newly launched Spot Instances by yourself, you will have to pay for the total number of seconds you have used.

Architecture Diagram

Task Details

  1. Launching Lab Environment

  2. Create an Amazon SageMaker notebook instance

  3. Open JupyterLab and set up the kernel environment into EC2 Instance

  4. Execute the cells an Apache Server

  5. Delete AWS Resources