Reduce your cloud costs by 50%, safely

Optimize compute, storage and data
Choose copilot or autopilot execution
Continuously improve with reinforcement learning

Introduction

In this article we’ll dive into four engineering optimizations for Amazon ECS:

Rightsizing
Placement for tasks and clusters
Autoscaling
Scheduled Shutdowns

Rightsizing

Rightsizing ECS ServicesService auto scaling

When rightsizing ECS services, we want to identify the best possible configuration for CPU, memory and the number of tasks. For the best possible results, we have to consider factors including the application’s releases, and the traffic that it encounters.

In the example below, we can see that we have reduced the CPU, increased the memory, and decreased the task count as well. With just this right-sizing, we were able to achieve cost savings of 43%, and bring down the latency by 25%:

We need to analyze metrics including CPU utilization, memory usage and traffic over time to identify the ideal configuration. We want to be aware of each application’s behavior and understand its traffic patterns. Moreover, we need to learn how to fine tune this configuration over time, while validating and updating it if needed, after every release.

Rightsizing Container Instances

After optimizing your services and tasks, we can now move on to optimizing the underlying EC2-backed container instances in the cluster, if you are using the EC2 Launch Type.

In the example below, we have a memory-intensive application running on 10 tasks, in a typical traffic condition. If you had a higher task memory reservation, you wouldn't have to run that many tasks. The choice of container instance also can be optimized once the definition is updated. Moving from four c5a.large instances to two r4.xlarge instances can boost performance (with a 33% reduction in latency), while reducing cost by 27%.

Graviton Instance Types

Adopting AWS Graviton instances can offer significant value, primarily through cost efficiency and enhanced performance for certain types of workloads. AWS Graviton processors are custom-built by AWS using 64-bit Arm Neoverse cores, which provide a better price performance ratio for many cloud workloads compared to traditional x86 processors. You can expect up to 40% better price performance over comparable current generation x86-based instances. This makes them especially appealing for running containerized applications on Amazon ECS, where cost efficiency and scalable performance are crucial. Furthermore, these processors are optimized for workloads that benefit from high throughput and low latency, such as web servers, containerized microservices, and data processing tasks.

To check whether your workloads can efficiently run on Graviton instances, you should first ensure that the application and its dependencies are compatible with the ARM architecture. This involves checking that the application binaries and all related libraries are either precompiled for ARM, or are capable of being compiled from source for the architecture. AWS provides an ECS-optimized Amazon Linux AMI for ARM, which can be used to launch ECS instances on Graviton processors. Users can start by setting up a test environment in ECS using Graviton instances to benchmark the performance and identify any potential issues. This can be achieved by modifying the ECS task definitions to specify the use of the ARM64 architecture and deploying these tasks into a cluster backed by Graviton instances. Monitoring and observability tools should be employed to evaluate the application’s behavior and performance, ensuring that all functionalities operate as expected before moving the production workloads to Graviton instances.

Using Larger Instance Types

For workloads with high sustained demand, using the larger sizes within a family can sometimes provide additional savings. In the example below the cost per vCPU is 40% cheaper if c3.8xlarge is used vs c3.large.

Strategy	c3.large	c3.xlarge	c3.2xlarge	c3.4xlarge	c3.8xlarge
Cost per hour	$0.0416	$0.0868	$0.1726	$0.3418	$0.3968
Cost per vCPU	$0.0208	$0.0217	$0.0216	$0.0214	$0.0124

Spot Cost for US West 1a AZ Showing Declining Cost per vCPU

Placement

Task Placement

Task placement is the process of choosing which container instance (virtual machine) in your cluster to place the task on. ECS provides several task placement strategies to help you customize the deployment of your applications. These strategies determine how tasks are placed on the available container instances, allowing you to balance factors like cost, performance, and availability based on your application's needs. The table below summarizes the three main task placement strategies in ECS, their key goals (e.g., binpack supports cost optimization) and how they work:

Strategy	Primary Objective	Description
Binpack	Cost Optimization	Tasks are placed on the container instances leaving the least amount of CPU and memory unused to maximize resource utilization and reduce the number of instances required.
Spread	High Availability	Distributes tasks evenly based on specified attributes, such as Availability Zones or hosts, to improve fault tolerance and application reliability.
Random	Balanced Utilization	Randomly places tasks on available container instances, without any specific optimization.

Task Placement Strategies in Amazon ECS

By understanding these task placement strategies and their trade-offs, you can configure the appropriate strategy or combination of strategies to meet the requirements of your containerized applications running on Amazon ECS.

Geographic Placement

Optimizing Regions

Greater variation in EC2 & Fargate pricing exists across regions. Direct costs such as compute can vary significantly across regions due to local market conditions, the cost of electricity, and even regional promotions by AWS. For example, newer regions might offer lower prices to attract customers, while regions with dense infrastructure and high demand might see higher prices. Below is an example of spot pricing variation by instance type and region reported by an AWS user, showing that low cost regions were up to 16-50% below the most costly regions:

Instance type	US West 2 (Oregon)	US West 1 (Northern California)	US East 1 (N. Virginia)	US East 2 (Ohio)	EU West 1 (Ireland)	Lowest vs Highest
a1.medium	$0.0083	N/A	$0.0089	$0.0049	$0.0084	-45%
m5.large	$0.0342	$0.0368	$0.0397	$0.0200	$0.0361	-50%
c4.large	$0.0321	$0.0293	$0.0317	$0.0192	$0.0327	-41%
t3.medium	$0.0125	$0.0149	$0.0125	$0.0125	$0.0137	-16%

Hourly Spot prices by Region - least costly regions up to 16-50% below most costly region

‍
For workloads that are less sensitive to latency—such as batch processing tasks, background jobs, or applications where user interaction is minimal—choosing a region, based primarily on cost, can be beneficial. For businesses that require high-speed data access or cater to users in specific geographical locations, selecting the nearest region to reduce latency is advisable, even if that means slightly higher direct compute costs.

When choosing a region for ECS deployments, it's essential to balance cost optimization with potential impacts on performance and user experience. Beyond direct compute costs, full cost considerations should include network costs, especially when data needs to be transferred between different regions or from an external network. Choosing a region that is geographically distant from the end-users can adversely affect customer experience. Furthermore, data access speed can be impacted if your ECS tasks need frequent access to data stored in another region, adding latency and additional data transfer costs.

Optimizing Availability Zones

For example, in the case of Amazon EC2, the pricing for on-demand instances typically remains the same across AZs within a region, but spot instance pricing can vary significantly. These variations create the potential for cost saving by shifting workloads to that zone. Below is an example of spot pricing in two US West 1 AZs - significant savings (65%) are only available in US West 1a:

AZ:	US West 1a	US West 1b
Cost per hour	$0.0416	$0.1122
Cost per vCPU hour	$0.0208	$0.0561
Saving vs On-Demand	65%	6.5%

Spot pricing for c3.large instances vs. on demand ($0.12/hour)

This can lead to cost reductions, especially for compute-heavy applications that consume a lot of EC2 capacity. However, if the application has a strong dependency on data stored in other AWS services (e.g., S3, DynamoDB) or requires high-bandwidth network connectivity, the potential cost savings from availability zone optimization may be offset by increased data transfer or network charges. In these situations, the focus should be on optimizing the data and network architecture to minimize the impact on overall costs.

Spreading your ECS service tasks across multiple availability zones can also improve performance and availability. For example, by leveraging three different spot instance pools, you can decrease the chances of a large portion of your spot instances being interrupted if demand increases in one zone. This diversification helps maintain capacity and ensures your application can continue running even if one availability zone experiences issues.

The ability of Spot Fleets to mix instance types and sizes across availability zones can also be leveraged to optimize cost and maintain availability. This flexibility allows the Spot Fleet to adapt to changing demand and pricing conditions.

Auto Scaling for ECS Services & Clusters

With right sizing, the new resourcing we decide on is based on one traffic level. But we may have variable traffic and need to use auto scaling to better adapt capacity to traffic changes. Let’s review a few common seasonality patterns and then discuss how to use autoscaling to handle predictable and unpredictable variations in traffic.

Understanding Your Traffic Patterns

Traffic may vary by time of day and day of the week. For example, US Food Delivery companies website traffic varies by 8-10x from overnight lows to peaks at lunch and dinner as shown below:

Source: Cloudflare

Traffic may also vary throughout the year e.g., US B2C ecommerce companies often have peaks immediately following Thanksgiving as Black Friday and Cyber Monday are two of the biggest shopping days of the year. Below is an example of web traffic to US ecommerce companies by hour during the Thanksgiving period in 2023:

Source: Cloudflare

In the past as September came around, ecommerce companies may already be starting to deploy compute capacity to be ready for Thanksgiving. But putting in this maximum capacity is a cost burden and not the ideal solution. Ideally we would want to keep our scale closer to the needs of current traffic, while being ready for our seasonality patterns.

Tax firms also have traffic spikes at certain times of year. Intuit notes that their Turbotax product has two peaks. The first is late January — early February, when US employees receive their W2s. The second is in the week leading up to the tax filing deadline of April 15. These patterns together with testing spikes are shown below:

Source: Intuit

Using Auto Scaling

Autoscaling helps your application respond to peaks and valleys of traffic. ECS autoscaling can be done at two levels:

Service auto scaling which adjusts the task count.
Cluster auto scaling which dynamically adjusts container instance count.

By adding autoscalers to your service, you can increase its availability. And it enables the services to handle requests, even during peak traffic.

The benefits also come in improved cost and performance. Costs reduce as you are not overprovisioned outside traffic peaks traffic. And because it auto scales, performance is better during traffic peaks.

When configuring autoscalers, we need to look at the metric that we need to scale against and the threshold associated with it. We need to determine this metric by looking into your application, whether it's CPU bound or memory bound, and then decide on the metrics. The metrics that can be used include CPU, memory, request count, or custom application metrics like queue size.

In the example below, we see that we are using a c5a.xlarge instance. Our cluster has eight instances and It's provisioned for peak traffic. After optimization or after adding an autoscaler, we see that we decreased the number of instances needed for typical traffic. Once our cluster has an Amazon ECS Auto Scaling group capacity provider with managed scaling turned on, scaling happens automatically. Now let’s set the desired task count for our service to four and then use service auto scaling to set min and max task counts to two and eight respectively, so that it can scale accordingly as required, based on the scaling policies attached.

Scaling Options in Amazon EC2 Launch Type

Amazon EC2 Auto Scaling provides several scaling strategies to help you automatically adjust the capacity of your application based on demand. These strategies include dynamic, predictive scaling, and scheduled scaling. The table below summarizes the key scaling strategies available in EC2 Auto Scaling.

Scaling Strategies in Amazon EC2 Auto Scaling

Strategy

Primary Objective

Description

Dynamic Scaling

Responsiveness

Automatically scales your resources in response to real-time changes in traffic or other metrics. This includes target tracking scaling policies that maintain a target utilization level, and step scaling policies that scale based on thresholds.

Predictive Scaling

Cost Optimization

Automatically scales your resources based on predicted traffic patterns, using machine learning to forecast demand. This can help you provision the right amount of capacity in advance and avoid over or under-provisioning.

Scheduled Scaling

Cost Optimization

Allows you to schedule scaling actions based on predictable changes in demand, such as increasing capacity before a known traffic spike

By understanding these scaling strategies and their trade-offs, you can configure the appropriate strategy or combination of strategies to meet the scaling requirements of your applications running on Amazon EC2.

Effective Auto Scaling Strategies in AWS

Understanding Warm-Up and Cool-Down Periods

Warm-Up Times: Essential for services to stabilize before being considered fully operational. Set an adequate warm-up time to prevent scaling "thrashing." For example, if your application experiences a CPU spike at startup and takes 15 seconds to stabilize, ensure the warm-up period is at least this long to accommodate the initialization process.

Cool-Down Times: Critical to prevent further scaling actions until the effects of previous scaling are realized. This helps to stabilize the system and avoids unnecessary scaling operations.

Types of AWS Scaling and Examples

1. Target Tracking Scaling

Example: Maintain a web application's CPU utilization at 50%. Auto Scaling adjusts the number of EC2 instances automatically based on this target metric, increasing or decreasing instances as needed.
Use Case: Ideal for applications with predictable, variable workloads.

2. Step Scaling

Example: For an eCommerce site during a flash sale, set a policy to add instances in large increments based on specific latency thresholds—10 instances if latency exceeds 10 seconds, and 20 if it exceeds 20 seconds.
Use Case: Suitable for applications experiencing sudden and significant changes in demand.

3. Scheduled Scaling

Example: Increase instances at 8 AM for a business application used primarily during 9 AM to 5 PM work hours and decrease at 6 PM after peak usage.
Use Case: Best for managing resources during predictable load variations, such as daily peaks or known event-driven spikes.

Best Practices for Auto Scaling

Monitor and Adjust: Regularly review the performance of your Auto Scaling groups and adjust policies based on observed behavior.
Integration with Load Balancing: Combine Auto Scaling with AWS Elastic Load Balancing to efficiently distribute incoming traffic across instances.
Testing: Periodically simulate various load scenarios to ensure your Auto Scaling and load balancing setups perform as expected.

By tailoring warm-up and cool-down periods and choosing the appropriate scaling type, you can optimize resource usage, maintain application performance, and manage costs effectively in AWS environments.

Scheduled Shutdown

A major benefit of using public cloud services is the ability to access resources on demand and not pay for them when you are not using them.

You may have development environments that you don't use over the weekends, or during nighttime. Consider shutting them down in such periods. AWS provides solutions such as AWS instance scheduler, which automates the starting and stopping of EC2 instances. Sedai Schedules also allow you to set schedules for ECS for both EC2 and Fargate launch types.

As an example, if you just consider shutting down a dev environment you don’t use on the weekend, it would provide you a cost saving of around 29% If you operate just 24/5 (i.e., run five days in a week), compared to 24/7 operation.

‍

Thank you for submitting your feedback.

Oops! Something went wrong while submitting the form.

Four Engineering Optimizations for Amazon ECS

Benjamin Thomas

Published on

May 10, 2024

Last updated on

November 21, 2024

Max 3 min

Introduction

In this article we’ll dive into four engineering optimizations for Amazon ECS:

Rightsizing
Placement for tasks and clusters
Autoscaling
Scheduled Shutdowns

Rightsizing