What are Spot Instances and options on AWS, Azure, and GCE?


What are Spot instances?

They’ve been around for several years, particularly prominent in the world of Amazon AWS. They provide access to unused resources within a cloud data center at a significant discount – but the resources can be recalled by the cloud provider almost immediately.

When are good option?

The fact that Spot Virtual Machines can stop running with almost no notice clearly means they don’t make sense in every case, but there are certain scenarios where they can work very well such as:

  • Big Data
  • Containers and kubernetes services
  • Development & Testing environments
  • High Performance Computing (HPC)
  • Short-lived jobs

Options and Best Practices:

Spot Instances in AWS

AWS has had Spot Instance options available for many years.

  • With Spot Instances, you need to specify the price you are willing to pay per hour. If the spot price is below your bid price, you can use that Instance until the price is above your bid price.
  • When you are submitting a Spot Instance request, you need to specify the Instance type and the Availability Zone.
  • In AWS Spot Instances, you need to prepare for interruptions. When the this Instance price is above your bid price, your Instance will be taken away and terminated.
  • AWS recently announced that you can stop Spot Instances. Until recently, you could not stop an AWS Spot Instance even though it is an EBS backed Instance, your only option was to reboot.
  • AWS Spot Instances are integrated with Auto-scaling
  • You can use a combination of on-demand and Spot Instances
  • Spot Instances have a feature called EC2 Spot Fleet. With that, you can bid and launch a bunch of EC2 servers with single API request. You need to specify a maximum price, target capacity, Instance type and availability zone. Even though Spot Instance prices change, Fleet tries to maintain the desired Instance capacity.
  • You can use Spot Instances with the dedicated or multi-tenant Instance. Dedicated Instances run dedicated hardware and only for a single customer.

Spot Instances in Azure

Azure recently launched the concept of Spot Instance,

  • You can choose to deploy your Spot Virtual Machines without capping the price. Azure will charge you the Spot Virtual Machine price at any given time, giving you peace of mind that your Virtual Machines will not be evicted for price reasons.
  • If your workload does not require a specific Virtual Machine series and size, then you can find other Virtual Machines in the same region that may be cheaper.
  • If your workload is not dependent on a specific region, then you can find a different Azure region to reduce your cost.
  • For long-running operations, try to create checkpoints so that you can restart your workload from a previous known checkpoint to handle evictions and save time.
  • In scale-out scenarios, to save costs, you can have two VMSS, where one has regular Virtual Machines and the other has Spot Virtual Machines. You can put both in the same load balancer to opportunistically scale out.
  • Listen to eviction notifications in the Virtual Machine to get notified when your Virtual Machine is about to be evicted.
  • If you are willing to utilize pay-as-you-go prices, then use Eviction type to “Capacity Eviction only”, in the API provide “-1” as max price as Azure never charges you more than the Spot Virtual Machine price.
  • To handle evictions, build a retry logic to redeploy Virtual Machines. If you do not require a specific Virtual Machine series and size, then try to deploy a different size that matches your workload needs.
  • While deploying VMSS, select max spread in portal management tab or FD==1 in the API to find capacity in a zone or region.

Spot Instances in Google Cloud

Google Cloud’s option for Spot Instances is like AWS – called “Preemptible VM Instances”.

  • Preemptible VM Instances are 80% cheaper than a regular VM, and there is no variable pay like AWS, Here the prices are fixed.
  • Preemptible VM Instances are terminated after 24 hours and you get a 30-second time slot before terminating.
  • You can see preemptible VM Instances in the Google console, and it will be in the terminated state. You can still recover the data from the attached storage, but the attached storage is billable.
  • Google shutdown scripts can use to perform cleanup tasks, export logs, and gracefully terminate a running process.
  • You can use Preemptible VM with managed Instance groups. Managed Instance groups are like Autoscaling in AWS.
  • If the Preemptible Instance gets terminated, compute engine tries to launch a replacement Instance.