Summary
- Choosing the right Amazon S3 storage class is crucial for optimizing storage costs based on data access frequency and retrieval needs.
- Key considerations include access frequency, performance (retrieval time), availability, durability, and cost.
- S3 Standard is ideal for frequently accessed data. S3 Standard-IA and One Zone-IA are cost-effective for infrequent access.
- Glacier, Glacier Flexible Retrieval, and Glacier Deep Archive offer substantial cost savings for long-term archival storage.
- Intelligent Tiering is beneficial for unpredictable access patterns, automating cost savings by transitioning data between tiers.
Introduction
Amazon S3 (Simple Storage Service) is a highly scalable and durable object storage service that provides a range of storage classes designed to meet diverse data storage needs, performance requirements, and budget constraints. Choosing the right storage class can significantly impact the cost-effectiveness and efficiency of your data management strategy. In this blog we’ll look at key considerations for choosing storage classes.
Key Considerations for S3 Storage Classes
The key considerations when choosing an S3 storage class are access frequency, performance (retrieval time), availability, durability, and cost. Feel free to read on, or watch a summary of these considerations in the video below which I gave at our annual autocon conference):
Access Frequency
- Definition: How often you need to access your data.
- Measurement: Frequency of data retrieval (e.g., hourly, daily, monthly).
- Storage Class Recommendation: For frequent access, S3 Standard is ideal. For infrequent access, consider Standard-IA or One Zone-IA.
Performance
- Definition: The time required to retrieve data when requested.
- Measurement: Measured in milliseconds, minutes, or hours.
- Storage Class Recommendation: If immediate access is needed, avoid Glacier classes. Glacier and Glacier Deep Archive are suitable for archival data with flexible retrieval times.
Availability
- Definition: The percentage of time that your data is accessible when needed.
- Measurement: Expressed as a percentage (e.g., 99.99% availability).
- Storage Class Recommendation: S3 Standard offers the highest availability, while One Zone-IA provides lower availability due to single-zone storage. Choose based on how critical it is for your data to be accessible at all times.
Durability
- Definition: The likelihood that your data will be preserved without loss.
- Measurement: Represented as the percentage of data survival over a given time period (e.g., 99.999999999% durability).
- Storage Class Recommendation: All S3 storage classes offer high durability, but classes like Glacier and Glacier Deep Archive are particularly suitable for data that needs to be stored securely over long periods.
Cost Management
- Definition: Managing expenses associated with storing and accessing data.
- Measurement: Costs per GB of storage and per retrieval, influenced by access patterns.
- Key components of Amazon S3 pricing (refer to the AWS S3 Pricing Page for details):
- Storage Pricing: Depends on the size of the objects, duration stored, and the storage class.
- Request and Data Retrieval Pricing: Charges for requests made against S3 buckets and objects (e.g., PUT, GET, LIST).
- Data Transfer Pricing: Costs for data transferred into and out of Amazon S3.
- Data Management and Insights: Charges for features like S3 Inventory and S3 Analytics.
- Replication Pricing: Costs for replicating data across different AWS regions.
- Transform and Query Pricing: Fees for features like S3 Select and S3 Object Lambda.
- Storage Class Recommendation: Use lifecycle policies to transition data to lower-cost storage classes like Glacier for long-term storage needs. Intelligent-Tiering can be beneficial for unpredictable access patterns, offering automated cost optimization.
Understanding Amazon S3 Storage Classes
S3 Standard
- Use Case: Frequently accessed data.
- Benefits: Low latency and high throughput performance.
- Ideal For: Big data analytics, mobile and gaming applications, and content distribution.
- AWS Case Study: Nasdaq. Nasdaq uses Amazon S3 to support its data lake, enabling it to scale storage independently from compute, handling up to 70 billion records a day. “We were able to easily support the jump from 30 billion records to 70 billion records a day because of the flexibility and scalability of Amazon S3 and Amazon Redshift,” said Robert Hunt, Vice President of Software Engineering.
S3 Intelligent-Tiering
- Use Case: Data with unpredictable access patterns.
- Benefits: Automatically moves data between two access tiers as access patterns change, avoiding overpayment for infrequently accessed data.
- Ideal For: Dynamic datasets where access frequency is unknown.
- Case Study: BBC. The BBC Archives migrated their extensive media content to S3 Intelligent-Tiering, reducing costs and improving accessibility. “By using Amazon S3 Glacier Instant Retrieval and Amazon S3 Intelligent-Tiering, we get archive-like pricing models for content that we previously had in relatively hot storage,” says Tom Cartwright, Executive Product Manager.
- Expert Quote: We’re saving millions of dollars per year by using S3 Intelligent-Tiering” -Eric Legault, Principal Engineer at Salesforce. Salesforce utilized S3 Intelligent-Tiering to optimize costs for its data lake, reducing expenses by automatically moving data to the most cost-effective access tier as access patterns change.
S3 Standard-IA (Infrequent Access)
- Use Case: Data accessed less frequently but requiring rapid access when needed.
- Benefits: Lower cost compared to S3 Standard, suitable for long-term storage, backups, and disaster recovery.
S3 One Zone-IA
- Use Case: Secondary backups or easily re-creatable data.
- Benefits: Cost-effective but stored in a single availability zone, making it less resilient against data center failures.
S3 Glacier
- Use Case: Long-term archive data.
- Benefits: Extremely low storage cost with retrieval times ranging from minutes to hours, ideal for archival data and digital preservation.
- Case Study: Snap. Snap stores saved media on S3 Glacier, reducing costs significantly while maintaining the ability to retrieve data when needed.
S3 Glacier Flexible Retrieval
- Use Case: Data that is rarely accessed but needs faster retrieval times than Glacier Deep Archive.
- Benefits: Low storage cost with retrieval times ranging from minutes to hours, suitable for backup and disaster recovery.
S3 Glacier Deep Archive
- Use Case: Rarely accessed data with long-term retention requirements.
- Benefits: Lowest storage cost, with retrieval times in hours, perfect for data that is rarely accessed.
S3 Storage Class Comparison Table
See below for a comparison of storage classes; US West 1 is used as the reference region.
Storage Class |
Storage Cost per GB |
Retrieval Cost per GB |
Minimum Storage Duration |
Availability Zone Storage |
Retrieval Time |
S3 Standard |
$0.023 |
N/A |
None |
Multiple |
Milliseconds |
S3 Intelligent Tiering |
$0.023 (frequent), $0.0125 (infrequent) |
$0.01 per 1,000 objects (monitoring) |
30 days (infrequent) |
Multiple |
Milliseconds to minutes |
S3 Standard-IA |
$0.0125 |
$0.01 |
30 days |
Multiple |
Milliseconds |
S3 One Zone-IA |
$0.01 |
$0.01 |
30 days |
Single |
Milliseconds |
S3 Glacier |
$0.004 |
$0.03-$0.12 |
90 days |
Multiple |
Minutes to hours |
S3 Glacier Flexible Retrieval |
$0.0036 |
$0.03-$0.12 |
90 days |
Multiple |
Minutes to hours |
S3 Glacier Deep Archive |
$0.00099 |
$0.02-$0.04 |
180 days |
Multiple |
Hours |
Conclusion
Selecting the appropriate Amazon S3 storage class is vital for optimizing both cost and performance based on your specific needs. By evaluating your data access patterns, retrieval time requirements, and budget, you can make an informed decision to choose the most suitable storage class for your use case.