Optimize compute, storage and data
Choose copilot or autopilot execution
Continuously improve with reinforcement learning
In this article we’ll dive into four engineering optimizations for Amazon ECS:
When rightsizing ECS services, we want to identify the best possible configuration for CPU, memory and the number of tasks. For the best possible results, we have to consider factors including the application’s releases, and the traffic that it encounters.
In the example below, we can see that we have reduced the CPU, increased the memory, and decreased the task count as well. With just this right-sizing, we were able to achieve cost savings of 43%, and bring down the latency by 25%:
We need to analyze metrics including CPU utilization, memory usage and traffic over time to identify the ideal configuration. We want to be aware of each application’s behavior and understand its traffic patterns. Moreover, we need to learn how to fine tune this configuration over time, while validating and updating it if needed, after every release.
After optimizing your services and tasks, we can now move on to optimizing the underlying EC2-backed container instances in the cluster, if you are using the EC2 Launch Type.
In the example below, we have a memory-intensive application running on 10 tasks, in a typical traffic condition. If you had a higher task memory reservation, you wouldn't have to run that many tasks. The choice of container instance also can be optimized once the definition is updated. Moving from four c5a.large instances to two r4.xlarge instances can boost performance (with a 33% reduction in latency), while reducing cost by 27%.
Adopting AWS Graviton instances can offer significant value, primarily through cost efficiency and enhanced performance for certain types of workloads. AWS Graviton processors are custom-built by AWS using 64-bit Arm Neoverse cores, which provide a better price performance ratio for many cloud workloads compared to traditional x86 processors. You can expect up to 40% better price performance over comparable current generation x86-based instances. This makes them especially appealing for running containerized applications on Amazon ECS, where cost efficiency and scalable performance are crucial. Furthermore, these processors are optimized for workloads that benefit from high throughput and low latency, such as web servers, containerized microservices, and data processing tasks.
To check whether your workloads can efficiently run on Graviton instances, you should first ensure that the application and its dependencies are compatible with the ARM architecture. This involves checking that the application binaries and all related libraries are either precompiled for ARM, or are capable of being compiled from source for the architecture. AWS provides an ECS-optimized Amazon Linux AMI for ARM, which can be used to launch ECS instances on Graviton processors. Users can start by setting up a test environment in ECS using Graviton instances to benchmark the performance and identify any potential issues. This can be achieved by modifying the ECS task definitions to specify the use of the ARM64 architecture and deploying these tasks into a cluster backed by Graviton instances. Monitoring and observability tools should be employed to evaluate the application’s behavior and performance, ensuring that all functionalities operate as expected before moving the production workloads to Graviton instances.
For workloads with high sustained demand, using the larger sizes within a family can sometimes provide additional savings. In the example below the cost per vCPU is 40% cheaper if c3.8xlarge is used vs c3.large.
Task placement is the process of choosing which container instance (virtual machine) in your cluster to place the task on. ECS provides several task placement strategies to help you customize the deployment of your applications. These strategies determine how tasks are placed on the available container instances, allowing you to balance factors like cost, performance, and availability based on your application's needs. The table below summarizes the three main task placement strategies in ECS, their key goals (e.g., binpack supports cost optimization) and how they work:
By understanding these task placement strategies and their trade-offs, you can configure the appropriate strategy or combination of strategies to meet the requirements of your containerized applications running on Amazon ECS.
Greater variation in EC2 & Fargate pricing exists across regions. Direct costs such as compute can vary significantly across regions due to local market conditions, the cost of electricity, and even regional promotions by AWS. For example, newer regions might offer lower prices to attract customers, while regions with dense infrastructure and high demand might see higher prices. Below is an example of spot pricing variation by instance type and region reported by an AWS user, showing that low cost regions were up to 16-50% below the most costly regions:
For workloads that are less sensitive to latency—such as batch processing tasks, background jobs, or applications where user interaction is minimal—choosing a region, based primarily on cost, can be beneficial. For businesses that require high-speed data access or cater to users in specific geographical locations, selecting the nearest region to reduce latency is advisable, even if that means slightly higher direct compute costs.
When choosing a region for ECS deployments, it's essential to balance cost optimization with potential impacts on performance and user experience. Beyond direct compute costs, full cost considerations should include network costs, especially when data needs to be transferred between different regions or from an external network. Choosing a region that is geographically distant from the end-users can adversely affect customer experience. Furthermore, data access speed can be impacted if your ECS tasks need frequent access to data stored in another region, adding latency and additional data transfer costs.
For example, in the case of Amazon EC2, the pricing for on-demand instances typically remains the same across AZs within a region, but spot instance pricing can vary significantly. These variations create the potential for cost saving by shifting workloads to that zone. Below is an example of spot pricing in two US West 1 AZs - significant savings (65%) are only available in US West 1a:
This can lead to cost reductions, especially for compute-heavy applications that consume a lot of EC2 capacity. However, if the application has a strong dependency on data stored in other AWS services (e.g., S3, DynamoDB) or requires high-bandwidth network connectivity, the potential cost savings from availability zone optimization may be offset by increased data transfer or network charges. In these situations, the focus should be on optimizing the data and network architecture to minimize the impact on overall costs.
Spreading your ECS service tasks across multiple availability zones can also improve performance and availability. For example, by leveraging three different spot instance pools, you can decrease the chances of a large portion of your spot instances being interrupted if demand increases in one zone. This diversification helps maintain capacity and ensures your application can continue running even if one availability zone experiences issues.
The ability of Spot Fleets to mix instance types and sizes across availability zones can also be leveraged to optimize cost and maintain availability. This flexibility allows the Spot Fleet to adapt to changing demand and pricing conditions.
With right sizing, the new resourcing we decide on is based on one traffic level. But we may have variable traffic and need to use auto scaling to better adapt capacity to traffic changes. Let’s review a few common seasonality patterns and then discuss how to use autoscaling to handle predictable and unpredictable variations in traffic.
Traffic may vary by time of day and day of the week. For example, US Food Delivery companies website traffic varies by 8-10x from overnight lows to peaks at lunch and dinner as shown below:
Source: Cloudflare
Traffic may also vary throughout the year e.g., US B2C ecommerce companies often have peaks immediately following Thanksgiving as Black Friday and Cyber Monday are two of the biggest shopping days of the year. Below is an example of web traffic to US ecommerce companies by hour during the Thanksgiving period in 2023:
Source: Cloudflare
In the past as September came around, ecommerce companies may already be starting to deploy compute capacity to be ready for Thanksgiving. But putting in this maximum capacity is a cost burden and not the ideal solution. Ideally we would want to keep our scale closer to the needs of current traffic, while being ready for our seasonality patterns.
Tax firms also have traffic spikes at certain times of year. Intuit notes that their Turbotax product has two peaks. The first is late January — early February, when US employees receive their W2s. The second is in the week leading up to the tax filing deadline of April 15. These patterns together with testing spikes are shown below:
Source: Intuit
Autoscaling helps your application respond to peaks and valleys of traffic. ECS autoscaling can be done at two levels:
By adding autoscalers to your service, you can increase its availability. And it enables the services to handle requests, even during peak traffic.
The benefits also come in improved cost and performance. Costs reduce as you are not overprovisioned outside traffic peaks traffic. And because it auto scales, performance is better during traffic peaks.
When configuring autoscalers, we need to look at the metric that we need to scale against and the threshold associated with it. We need to determine this metric by looking into your application, whether it's CPU bound or memory bound, and then decide on the metrics. The metrics that can be used include CPU, memory, request count, or custom application metrics like queue size.
In the example below, we see that we are using a c5a.xlarge instance. Our cluster has eight instances and It's provisioned for peak traffic. After optimization or after adding an autoscaler, we see that we decreased the number of instances needed for typical traffic. Once our cluster has an Amazon ECS Auto Scaling group capacity provider with managed scaling turned on, scaling happens automatically. Now let’s set the desired task count for our service to four and then use service auto scaling to set min and max task counts to two and eight respectively, so that it can scale accordingly as required, based on the scaling policies attached.
Amazon EC2 Auto Scaling provides several scaling strategies to help you automatically adjust the capacity of your application based on demand. These strategies include dynamic, predictive scaling, and scheduled scaling. The table below summarizes the key scaling strategies available in EC2 Auto Scaling.
Strategy
Primary Objective
Description
Dynamic Scaling
Responsiveness
Automatically scales your resources in response to real-time changes in traffic or other metrics. This includes target tracking scaling policies that maintain a target utilization level, and step scaling policies that scale based on thresholds.
Predictive Scaling
Cost Optimization
Automatically scales your resources based on predicted traffic patterns, using machine learning to forecast demand. This can help you provision the right amount of capacity in advance and avoid over or under-provisioning.
Scheduled Scaling
Cost Optimization
Allows you to schedule scaling actions based on predictable changes in demand, such as increasing capacity before a known traffic spike
By understanding these scaling strategies and their trade-offs, you can configure the appropriate strategy or combination of strategies to meet the scaling requirements of your applications running on Amazon EC2.
Warm-Up Times: Essential for services to stabilize before being considered fully operational. Set an adequate warm-up time to prevent scaling "thrashing." For example, if your application experiences a CPU spike at startup and takes 15 seconds to stabilize, ensure the warm-up period is at least this long to accommodate the initialization process.
Cool-Down Times: Critical to prevent further scaling actions until the effects of previous scaling are realized. This helps to stabilize the system and avoids unnecessary scaling operations.
1. Target Tracking Scaling
2. Step Scaling
3. Scheduled Scaling
By tailoring warm-up and cool-down periods and choosing the appropriate scaling type, you can optimize resource usage, maintain application performance, and manage costs effectively in AWS environments.
A major benefit of using public cloud services is the ability to access resources on demand and not pay for them when you are not using them.
You may have development environments that you don't use over the weekends, or during nighttime. Consider shutting them down in such periods. AWS provides solutions such as AWS instance scheduler, which automates the starting and stopping of EC2 instances. Sedai Schedules also allow you to set schedules for ECS for both EC2 and Fargate launch types.
As an example, if you just consider shutting down a dev environment you don’t use on the weekend, it would provide you a cost saving of around 29% If you operate just 24/5 (i.e., run five days in a week), compared to 24/7 operation.
May 10, 2024
October 9, 2024
In this article we’ll dive into four engineering optimizations for Amazon ECS:
When rightsizing ECS services, we want to identify the best possible configuration for CPU, memory and the number of tasks. For the best possible results, we have to consider factors including the application’s releases, and the traffic that it encounters.
In the example below, we can see that we have reduced the CPU, increased the memory, and decreased the task count as well. With just this right-sizing, we were able to achieve cost savings of 43%, and bring down the latency by 25%:
We need to analyze metrics including CPU utilization, memory usage and traffic over time to identify the ideal configuration. We want to be aware of each application’s behavior and understand its traffic patterns. Moreover, we need to learn how to fine tune this configuration over time, while validating and updating it if needed, after every release.
After optimizing your services and tasks, we can now move on to optimizing the underlying EC2-backed container instances in the cluster, if you are using the EC2 Launch Type.
In the example below, we have a memory-intensive application running on 10 tasks, in a typical traffic condition. If you had a higher task memory reservation, you wouldn't have to run that many tasks. The choice of container instance also can be optimized once the definition is updated. Moving from four c5a.large instances to two r4.xlarge instances can boost performance (with a 33% reduction in latency), while reducing cost by 27%.
Adopting AWS Graviton instances can offer significant value, primarily through cost efficiency and enhanced performance for certain types of workloads. AWS Graviton processors are custom-built by AWS using 64-bit Arm Neoverse cores, which provide a better price performance ratio for many cloud workloads compared to traditional x86 processors. You can expect up to 40% better price performance over comparable current generation x86-based instances. This makes them especially appealing for running containerized applications on Amazon ECS, where cost efficiency and scalable performance are crucial. Furthermore, these processors are optimized for workloads that benefit from high throughput and low latency, such as web servers, containerized microservices, and data processing tasks.
To check whether your workloads can efficiently run on Graviton instances, you should first ensure that the application and its dependencies are compatible with the ARM architecture. This involves checking that the application binaries and all related libraries are either precompiled for ARM, or are capable of being compiled from source for the architecture. AWS provides an ECS-optimized Amazon Linux AMI for ARM, which can be used to launch ECS instances on Graviton processors. Users can start by setting up a test environment in ECS using Graviton instances to benchmark the performance and identify any potential issues. This can be achieved by modifying the ECS task definitions to specify the use of the ARM64 architecture and deploying these tasks into a cluster backed by Graviton instances. Monitoring and observability tools should be employed to evaluate the application’s behavior and performance, ensuring that all functionalities operate as expected before moving the production workloads to Graviton instances.
For workloads with high sustained demand, using the larger sizes within a family can sometimes provide additional savings. In the example below the cost per vCPU is 40% cheaper if c3.8xlarge is used vs c3.large.
Task placement is the process of choosing which container instance (virtual machine) in your cluster to place the task on. ECS provides several task placement strategies to help you customize the deployment of your applications. These strategies determine how tasks are placed on the available container instances, allowing you to balance factors like cost, performance, and availability based on your application's needs. The table below summarizes the three main task placement strategies in ECS, their key goals (e.g., binpack supports cost optimization) and how they work:
By understanding these task placement strategies and their trade-offs, you can configure the appropriate strategy or combination of strategies to meet the requirements of your containerized applications running on Amazon ECS.
Greater variation in EC2 & Fargate pricing exists across regions. Direct costs such as compute can vary significantly across regions due to local market conditions, the cost of electricity, and even regional promotions by AWS. For example, newer regions might offer lower prices to attract customers, while regions with dense infrastructure and high demand might see higher prices. Below is an example of spot pricing variation by instance type and region reported by an AWS user, showing that low cost regions were up to 16-50% below the most costly regions:
For workloads that are less sensitive to latency—such as batch processing tasks, background jobs, or applications where user interaction is minimal—choosing a region, based primarily on cost, can be beneficial. For businesses that require high-speed data access or cater to users in specific geographical locations, selecting the nearest region to reduce latency is advisable, even if that means slightly higher direct compute costs.
When choosing a region for ECS deployments, it's essential to balance cost optimization with potential impacts on performance and user experience. Beyond direct compute costs, full cost considerations should include network costs, especially when data needs to be transferred between different regions or from an external network. Choosing a region that is geographically distant from the end-users can adversely affect customer experience. Furthermore, data access speed can be impacted if your ECS tasks need frequent access to data stored in another region, adding latency and additional data transfer costs.
For example, in the case of Amazon EC2, the pricing for on-demand instances typically remains the same across AZs within a region, but spot instance pricing can vary significantly. These variations create the potential for cost saving by shifting workloads to that zone. Below is an example of spot pricing in two US West 1 AZs - significant savings (65%) are only available in US West 1a:
This can lead to cost reductions, especially for compute-heavy applications that consume a lot of EC2 capacity. However, if the application has a strong dependency on data stored in other AWS services (e.g., S3, DynamoDB) or requires high-bandwidth network connectivity, the potential cost savings from availability zone optimization may be offset by increased data transfer or network charges. In these situations, the focus should be on optimizing the data and network architecture to minimize the impact on overall costs.
Spreading your ECS service tasks across multiple availability zones can also improve performance and availability. For example, by leveraging three different spot instance pools, you can decrease the chances of a large portion of your spot instances being interrupted if demand increases in one zone. This diversification helps maintain capacity and ensures your application can continue running even if one availability zone experiences issues.
The ability of Spot Fleets to mix instance types and sizes across availability zones can also be leveraged to optimize cost and maintain availability. This flexibility allows the Spot Fleet to adapt to changing demand and pricing conditions.
With right sizing, the new resourcing we decide on is based on one traffic level. But we may have variable traffic and need to use auto scaling to better adapt capacity to traffic changes. Let’s review a few common seasonality patterns and then discuss how to use autoscaling to handle predictable and unpredictable variations in traffic.
Traffic may vary by time of day and day of the week. For example, US Food Delivery companies website traffic varies by 8-10x from overnight lows to peaks at lunch and dinner as shown below:
Source: Cloudflare
Traffic may also vary throughout the year e.g., US B2C ecommerce companies often have peaks immediately following Thanksgiving as Black Friday and Cyber Monday are two of the biggest shopping days of the year. Below is an example of web traffic to US ecommerce companies by hour during the Thanksgiving period in 2023:
Source: Cloudflare
In the past as September came around, ecommerce companies may already be starting to deploy compute capacity to be ready for Thanksgiving. But putting in this maximum capacity is a cost burden and not the ideal solution. Ideally we would want to keep our scale closer to the needs of current traffic, while being ready for our seasonality patterns.
Tax firms also have traffic spikes at certain times of year. Intuit notes that their Turbotax product has two peaks. The first is late January — early February, when US employees receive their W2s. The second is in the week leading up to the tax filing deadline of April 15. These patterns together with testing spikes are shown below:
Source: Intuit
Autoscaling helps your application respond to peaks and valleys of traffic. ECS autoscaling can be done at two levels:
By adding autoscalers to your service, you can increase its availability. And it enables the services to handle requests, even during peak traffic.
The benefits also come in improved cost and performance. Costs reduce as you are not overprovisioned outside traffic peaks traffic. And because it auto scales, performance is better during traffic peaks.
When configuring autoscalers, we need to look at the metric that we need to scale against and the threshold associated with it. We need to determine this metric by looking into your application, whether it's CPU bound or memory bound, and then decide on the metrics. The metrics that can be used include CPU, memory, request count, or custom application metrics like queue size.
In the example below, we see that we are using a c5a.xlarge instance. Our cluster has eight instances and It's provisioned for peak traffic. After optimization or after adding an autoscaler, we see that we decreased the number of instances needed for typical traffic. Once our cluster has an Amazon ECS Auto Scaling group capacity provider with managed scaling turned on, scaling happens automatically. Now let’s set the desired task count for our service to four and then use service auto scaling to set min and max task counts to two and eight respectively, so that it can scale accordingly as required, based on the scaling policies attached.
Amazon EC2 Auto Scaling provides several scaling strategies to help you automatically adjust the capacity of your application based on demand. These strategies include dynamic, predictive scaling, and scheduled scaling. The table below summarizes the key scaling strategies available in EC2 Auto Scaling.
Strategy
Primary Objective
Description
Dynamic Scaling
Responsiveness
Automatically scales your resources in response to real-time changes in traffic or other metrics. This includes target tracking scaling policies that maintain a target utilization level, and step scaling policies that scale based on thresholds.
Predictive Scaling
Cost Optimization
Automatically scales your resources based on predicted traffic patterns, using machine learning to forecast demand. This can help you provision the right amount of capacity in advance and avoid over or under-provisioning.
Scheduled Scaling
Cost Optimization
Allows you to schedule scaling actions based on predictable changes in demand, such as increasing capacity before a known traffic spike
By understanding these scaling strategies and their trade-offs, you can configure the appropriate strategy or combination of strategies to meet the scaling requirements of your applications running on Amazon EC2.
Warm-Up Times: Essential for services to stabilize before being considered fully operational. Set an adequate warm-up time to prevent scaling "thrashing." For example, if your application experiences a CPU spike at startup and takes 15 seconds to stabilize, ensure the warm-up period is at least this long to accommodate the initialization process.
Cool-Down Times: Critical to prevent further scaling actions until the effects of previous scaling are realized. This helps to stabilize the system and avoids unnecessary scaling operations.
1. Target Tracking Scaling
2. Step Scaling
3. Scheduled Scaling
By tailoring warm-up and cool-down periods and choosing the appropriate scaling type, you can optimize resource usage, maintain application performance, and manage costs effectively in AWS environments.
A major benefit of using public cloud services is the ability to access resources on demand and not pay for them when you are not using them.
You may have development environments that you don't use over the weekends, or during nighttime. Consider shutting them down in such periods. AWS provides solutions such as AWS instance scheduler, which automates the starting and stopping of EC2 instances. Sedai Schedules also allow you to set schedules for ECS for both EC2 and Fargate launch types.
As an example, if you just consider shutting down a dev environment you don’t use on the weekend, it would provide you a cost saving of around 29% If you operate just 24/5 (i.e., run five days in a week), compared to 24/7 operation.