October 9, 2024
May 10, 2024
October 9, 2024
May 10, 2024
Optimize compute, storage and data
Choose copilot or autopilot execution
Continuously improve with reinforcement learning
Compute in AWS is primarily offered with four purchasing options:
On-demand instances follow the standard pay-as-you-go model, where you are billed by the second. This does not require any long term commitments, free’s you from the complexities of planning and purchasing compute, and are best suited for highly available fluctuating workloads.
Reserved Instances and Savings Plans are pricing models, where you commit to long term compute usage (usually 1 or 3 years in a more-or-less flexible way), in exchange for significant discounts. RIs and Savings Plans are perfect for workloads with steady usage or to handle the base load of unpredictable workloads .
Spot instances constitute the spare compute capacity in AWS which are available at steep discounts of up to 90% compared to on-demand prices. These are best suited for stateless or fault-tolerant workloads as AWS can claim these instances back to serve rising demand.
The purchasing options provided by AWS, all rely on the same underlying EC2 infrastructure which behave the same way. Spot is no exception.
Spot instances are the idle EC2 instances that are not being used to fulfill on-demand requests, and hence are made available at cheap prices with discounts of up to 90% off compared to on-demand prices. The prices of spot instances vary with time and demand.
All these price benefits come with a catch. Spot instances can be interrupted to fulfill rising on-demand requests. The instance will be given a 2 minute notice, after which, it is terminated. Fault-tolerant or stateless workloads hence work best with spot instances, as an interruption will have little to no impact on it.
AWS spot capacity is divided into spot instance pools. All spot instances of a specific instance type running in an AZ constitute a spot instance pool. For example: All C5.xlarge spot instances running in us-east-1a form a spot instance pool while all C5.2xlarge spot instances in the same AZ form another pool. Likewise, if you use the same instance type on three different AZ’s, you are consuming capacity from three different spot pools.
The prices and capacities of these pools fluctuate independent of each other. Tying this back to spot interruptions, the more instance pools you use, the more diverse you are, the less downtime you face, when demand increases. The “don't put all your eggs in one basket” concept applies to spot.
As of March 2024, with all the new capabilities we have, data shows us that spot interruptions have become fairly infrequent, with only 5% of spot instances interrupted in the last three months. The more we stick to best practices and proper diversification, the better spot as a whole will function.
ECS comes with built-in support for spot instances. Compute diversification across instance pools to automatic replacement of interrupted spot instances, are all taken care of by ECS.
A task in ECS can be launched in two ways:
In both these approaches, you can opt to make use of spot instances to bring in significant cost reductions, with EC2 spot generally providing higher discounts compared to Fargate. EC2 spot also requires you to choose the backing instance pools, whereas in Fargate, AWS takes the decision.
ECS automates spot lifecycle management by integrating with AWS Auto Scaling Groups. When there is an interruption, the ASG will try to provide you with a replacement instance, from another spot instance pool, depending on your configuration. For most fault-tolerant workloads, replacing an interrupted instance is more than enough to ensure availability.
ECS also supports automated spot instance draining. This can be enabled by passing a parameter to the ECS container agent via user data of your container instance.
Once enabled, ECS will place instances in a draining state when it receives the 2 minute spot interruption notice. All the tasks running on these instances will first be sent a SIGTERM signal, and then a SIGKILL signal 30 seconds after. This lets you stop your application gracefully, or even do that last mile log collection. ECS also deregisters all such tasks from the load balancer target group, while trying to reschedule them on the remaining available instances.
A mix of on-demand and spot capacity can bring in considerable savings while ensuring availability. For example: suppose you have a fleet running entirely on on-demand, which you want to overprovision by 50%. By using spot capacity to overprovision, you will only need to pay 5-10% more than your original cost.
With ECS, it is easy to achieve dynamic capacity type splits. You can specify the weight of each capacity type and the backing ASG handles the rest. For example: If you provide on-demand with a weight of two, spot with a weight of three, and you have 10 instances provisioned by the ASG. Then four of those instances will be on-demand and six of them will be spot.
Choosing Savings Plans and Reserved Instances can provide further gain. Savings Plans offer more flexible usage patterns than Reserved Instances. Below is a comparison of effective discounts relative to on demand prices for a range of purchase options.
Source: Mark Butcher via LinkedIn
Your final implementation should take all these options into consideration while keeping in mind their potential downsides. You can have a dynamic mix of on-demand and spot capacities wherein spot interruptions have minimal impact along with optimal multi-year commitments backed by proper understanding of your workload requirements.
In a very rough scenario of a mix of on-demand, spot, Reserved Instances, and Savings Plans in equal shares at maximum discount levels, you could achieve a 50% reduction in the overall cost.
May 10, 2024
October 9, 2024
Compute in AWS is primarily offered with four purchasing options:
On-demand instances follow the standard pay-as-you-go model, where you are billed by the second. This does not require any long term commitments, free’s you from the complexities of planning and purchasing compute, and are best suited for highly available fluctuating workloads.
Reserved Instances and Savings Plans are pricing models, where you commit to long term compute usage (usually 1 or 3 years in a more-or-less flexible way), in exchange for significant discounts. RIs and Savings Plans are perfect for workloads with steady usage or to handle the base load of unpredictable workloads .
Spot instances constitute the spare compute capacity in AWS which are available at steep discounts of up to 90% compared to on-demand prices. These are best suited for stateless or fault-tolerant workloads as AWS can claim these instances back to serve rising demand.
The purchasing options provided by AWS, all rely on the same underlying EC2 infrastructure which behave the same way. Spot is no exception.
Spot instances are the idle EC2 instances that are not being used to fulfill on-demand requests, and hence are made available at cheap prices with discounts of up to 90% off compared to on-demand prices. The prices of spot instances vary with time and demand.
All these price benefits come with a catch. Spot instances can be interrupted to fulfill rising on-demand requests. The instance will be given a 2 minute notice, after which, it is terminated. Fault-tolerant or stateless workloads hence work best with spot instances, as an interruption will have little to no impact on it.
AWS spot capacity is divided into spot instance pools. All spot instances of a specific instance type running in an AZ constitute a spot instance pool. For example: All C5.xlarge spot instances running in us-east-1a form a spot instance pool while all C5.2xlarge spot instances in the same AZ form another pool. Likewise, if you use the same instance type on three different AZ’s, you are consuming capacity from three different spot pools.
The prices and capacities of these pools fluctuate independent of each other. Tying this back to spot interruptions, the more instance pools you use, the more diverse you are, the less downtime you face, when demand increases. The “don't put all your eggs in one basket” concept applies to spot.
As of March 2024, with all the new capabilities we have, data shows us that spot interruptions have become fairly infrequent, with only 5% of spot instances interrupted in the last three months. The more we stick to best practices and proper diversification, the better spot as a whole will function.
ECS comes with built-in support for spot instances. Compute diversification across instance pools to automatic replacement of interrupted spot instances, are all taken care of by ECS.
A task in ECS can be launched in two ways:
In both these approaches, you can opt to make use of spot instances to bring in significant cost reductions, with EC2 spot generally providing higher discounts compared to Fargate. EC2 spot also requires you to choose the backing instance pools, whereas in Fargate, AWS takes the decision.
ECS automates spot lifecycle management by integrating with AWS Auto Scaling Groups. When there is an interruption, the ASG will try to provide you with a replacement instance, from another spot instance pool, depending on your configuration. For most fault-tolerant workloads, replacing an interrupted instance is more than enough to ensure availability.
ECS also supports automated spot instance draining. This can be enabled by passing a parameter to the ECS container agent via user data of your container instance.
Once enabled, ECS will place instances in a draining state when it receives the 2 minute spot interruption notice. All the tasks running on these instances will first be sent a SIGTERM signal, and then a SIGKILL signal 30 seconds after. This lets you stop your application gracefully, or even do that last mile log collection. ECS also deregisters all such tasks from the load balancer target group, while trying to reschedule them on the remaining available instances.
A mix of on-demand and spot capacity can bring in considerable savings while ensuring availability. For example: suppose you have a fleet running entirely on on-demand, which you want to overprovision by 50%. By using spot capacity to overprovision, you will only need to pay 5-10% more than your original cost.
With ECS, it is easy to achieve dynamic capacity type splits. You can specify the weight of each capacity type and the backing ASG handles the rest. For example: If you provide on-demand with a weight of two, spot with a weight of three, and you have 10 instances provisioned by the ASG. Then four of those instances will be on-demand and six of them will be spot.
Choosing Savings Plans and Reserved Instances can provide further gain. Savings Plans offer more flexible usage patterns than Reserved Instances. Below is a comparison of effective discounts relative to on demand prices for a range of purchase options.
Source: Mark Butcher via LinkedIn
Your final implementation should take all these options into consideration while keeping in mind their potential downsides. You can have a dynamic mix of on-demand and spot capacities wherein spot interruptions have minimal impact along with optimal multi-year commitments backed by proper understanding of your workload requirements.
In a very rough scenario of a mix of on-demand, spot, Reserved Instances, and Savings Plans in equal shares at maximum discount levels, you could achieve a 50% reduction in the overall cost.