Learn how Palo Alto Networks is Transforming Platform Engineering with AI Agents. Register here

Attend a Live Product Tour to see Sedai in action.

Register now
More
Close

How KnowBe4 Achieved 27% Cost Savings with Autonomous Cloud Optimization

AI-driven optimization delivers substantial cost savings and performance improvements while reducing engineer toil for AWS ECS and Lambda workloads

RESULTS

27%

Cloud Cost Savings

98%

Services Optimized Autonomously

1,100

Autonomous Actions in first 90 days

5 months

Payback Period

USE CASES

Cloud Cost Optimization

Performance Improvement

Ops Productivity

Release Quality

KEY CAPABILITIES

Autonomous Optimization

Autonomous Remediation

Release Intelligence

TECH STACK

Amazon ECS 

AWS Lambda

AWS CloudWatch

Datadog

Gitlab

INDUSTRY

Cybersecurity

GEOGRAPHY

North America

Summary

  • KnowBe4, the world's largest security awareness training provider serving over 70,000 organizations globally, faced significant optimization challenges with their rapidly growing AWS ECS and Lambda services.
  • The company implemented Sedai's autonomous optimization platform to reduce engineer toil, improve efficiency, and manage costs effectively across their complex infrastructure.
  • KnowBe4 adopted a phased approach (Crawl, Walk, Run) to gradually implement autonomous optimization across their infrastructure, ensuring minimal disruption to their operations.
  • As a result of this implementation, KnowBe4 achieved an overall 27% cost savings across their cloud infrastructure, with 98% of their services now running autonomously. Over 1,100 autonomous actions were performed in just 3 months, significantly reducing the operational burden on their engineering team.
  • The company realized a return on investment within just 5 months, exceeding their initial expectations and validating their decision to adopt autonomous optimization.

Problem

KnowBe4, a leader in security awareness training, experienced rapid growth that led to significant challenges in managing and optimizing their cloud infrastructure. As their customer base expanded to over 70,000 organizations worldwide, the company faced unprecedented scaling issues across their AWS environment.

Nate Singletary, Staff Site Reliability Engineer at KnowBe4, succinctly described the challenge they faced: "We have ECS services running in AWS, and we want to ensure they're running efficiently. How do we know if they are? And if they're not, how do we react to that? How do we fix it?"

Matthew Duren, Sr. Director of Software Engineering at KnowBe4, outlined the scale of their operations: "We have something like 70,000 customers across the world. So we're seeing tons of growth right now, especially internationally. The US environment is definitely our largest. Just from our day-to-day, we peak out at thousands and thousands of requests per second."

The company's infrastructure growth was staggering, with a 58% year-over-year increase in ECS usage, managing over 3,000 services and handling 2,000-4,000+ peak tasks daily. Their Lambda usage saw an even more dramatic surge, with a 422% year-over-year growth, encompassing over 2,500 functions and processing over 250 million Lambda invocations daily.

This rapid expansion created a complex web of thousands of microservices across ECS and Lambda, with frequent code deployments averaging every 20 minutes. The frequent releases and the need for high performance in real-time cybersecurity delivery put immense pressure on the engineering team to optimize resources continually. But manual optimization processes would have been time-consuming and inefficient, especially given KnowBe4’s scale of operations.

Adding to the complexity, KnowBe4 is a heavy user of AI and ML in their workloads. Their AI-driven services include AIDA (Artificial Intelligence Driven Agent), which runs in the background to allow customers to use AI in their day-to-day platform usage. They also employ a Virtual Risk Officer that uses AI to assign risk scores to users based on their positions and behaviors. Other AI-powered features include PhishML for automated dispositioning of potential phishing emails, and various content selection tools. The sophisticated nature of these AI workloads added another layer of complexity to their infrastructure management needs.

Matt Duren emphasized the challenge of scaling their operations: "We sit and we try to scale our infrastructure, to scale our software, to build software that will scale to any number of users. But until now, until we started using Sedai, we really had no solution for actually scaling the people side of things, the team."

It became increasingly clear that the manual approach to optimization was not scalable, preventing KnowBe4 from fully optimizing their cloud resources and maintaining peak performance for their customers while managing costs effectively. This complex set of challenges set the stage for KnowBe4's exploration of autonomous optimization solutions.

Solution

To address these multifaceted challenges, KnowBe4 decided to implement Sedai's autonomous optimization platform. They followed a carefully planned approach, which Matt Duren described as: "We took some steps to mitigate what his fears were, what our fears were, and decided we would start small, we would choose a specific set of services to apply this autonomous optimization to."

KnowBe4 adopted a phased implementation strategy, which they referred to as Crawl, Walk, Run:

1. Crawl:
  - Set up the Sedai integration with their AWS environment
  - Established initial cost-saving goals
  - Enabled autonomous optimization on a set of low-risk services to evaluate the impact

2. Walk:
  - Analyzed results from the initial optimization efforts
  - Expanded Sedai's implementation to include flagship products
  - Created groups divided by products and regions
  - Set tailored goals for cost and performance based on service requirements

3. Run:
  - Fully embraced Sedai's autonomous optimization across their infrastructure
  - Configured services to be autonomously optimized by default upon deployment
  - Integrated Sedai across all AWS accounts and regions

Select Activities during KnowBe4's Crawl, Walk, Run Implementation Stages for Autonomous Optimization

KnowBe4's optimization strategy encompassed several key areas. They focused on service optimization, configuring horizontal and vertical scaling for optimal cost and performance, fine-tuning memory, CPU, and task counts. Container instance optimization was implemented, selecting instance types on an application-aware basis and factoring in app-level latency. The team also explored various purchasing options, identifying the most cost-effective combination of on-demand and savings plans based on predicted traffic patterns.

Risk mitigation was a crucial aspect of the implementation. KnowBe4 took a cautious approach, starting with a small subset of services in production and gradually expanding to dev/test environments. Matt Duren commented on this strategy: "We wanted to start with something not dev, not test, not a lower environment where there's not real traffic coming to it, even if it's heavily, heavily, heavily tested by a software engineering test team or QA team. We still knew we wanted to serve production traffic through a service that was being optimized by Sedai."

To ensure smooth adoption, KnowBe4 integrated Sedai into their CI/CD process, creating a fully autonomous workflow. They also implemented release evaluation to automatically assess the impact of new deployments on cost, performance, and availability. This approach helped align the development teams with the new autonomous optimization paradigm.

KnowBe4 Architecture Including IaC and Sedai

Sedai was deployed to optimize KnowBe4's ECS Fargate workloads and Lambda functions, autonomously rightsizing services, adjusting auto-scaling configurations, and managing resource allocation. The platform utilized AI and machine learning techniques to analyze service behavior, predict resource needs, and make real-time adjustments to optimize both cost and performance.

Results

KnowBe4's implementation of Sedai's autonomous optimization platform yielded impressive results across multiple areas, significantly impacting their cloud operations, cost management, and overall efficiency. Matt Duren shared the significant outcomes:

This rapid return on investment validated KnowBe4's decision to adopt autonomous optimization and exceeded their initial expectations.

KnowBe4 successfully managed their rapid growth, handling a 58% year-over-year increase in ECS usage and a 422% growth in Lambda usage. The continuous optimization ensured that resources were always aligned with current needs, even as traffic patterns and application behaviors changed. This adaptability was crucial in supporting KnowBe4's expanding customer base and evolving service offerings.

Cloud Cost Savings

The company achieved an overall 27% cost savings across their cloud infrastructure, demonstrating the significant impact of autonomous optimization at scale. When looking at individual services and environments, the savings were even more dramatic. In development environments, some ECS services saw up to 87% cost reduction, while in production, savings reached up to 50%. These significant savings allowed KnowBe4 to reallocate resources to other strategic initiatives and support their continued growth.

KnowBe4 Cost Savings Opportunities in Sedai

Nate Singletary provided a comprehensive overview of the results:

This high level of autonomous operation significantly reduced the manual workload on the engineering team.

Impact of Optimization on an AWS Lambda Service

Lambda functions saw particularly impressive improvements, with some cases achieving up to 99.3% cost savings. Matt provided a specific example that highlighted both cost and performance benefits: "This is a particular Lambda function that serves a really specific purpose in our production environment. We saw a 31% cost decrease for that function. But again, we saw a 54% decrease in the latency." This dual improvement in cost and performance was a key factor in the success of the implementation.

Performance Gains

Performance enhancements were equally impressive, with significant latency reductions across services leading to improved customer experience. Matt shared a striking example that demonstrated the magnitude of these improvements: "We took it from an average response time of 18.5 seconds to 80 milliseconds or so. A 99.5% duration reduction." Such dramatic performance improvements not only enhanced the user experience but also allowed KnowBe4 to serve their customers more efficiently, supporting their mission of providing real-time cybersecurity training and awareness.

Operational Productivity

Operational efficiency saw a massive boost, with 98% of KnowBe4's 9,491 services now running autonomously. This high level of autonomous operation significantly reduced the manual workload on the engineering team. Over 1,100 autonomous actions were performed in just 3 months. This dramatic increase in efficiency allowed for enhanced engineer productivity, as Matt commented:

This proactive approach to optimization not only improved efficiency but also enhanced the reliability and stability of KnowBe4's services.

KnowBe4 Release Intelligence Score for a New Release

Release management saw significant improvements with the implementation of automatic evaluation for every new release. This system flagged releases with major deviations to application developers, increasing release confidence and enabling faster innovation. The ability to quickly identify and address potential issues in new releases helped KnowBe4 maintain their rapid development pace while ensuring the quality and performance of their services.

Team Benefits

Reflecting on the impact of autonomous optimization, Nate Singletary highlighted their key objectives:

The implementation of Sedai's platform has allowed KnowBe4 to successfully meet these objectives, balancing cost efficiency with performance and enabling their engineering team to focus on high-value tasks.

Matt Duren summarized the impact on their SRE team, highlighting the shift from routine tasks to more valuable work: "I think the biggest change that I've seen is my teams are able to work on a lot more valuable projects. So there's always a ton of toil. And especially Google calls our approach kind of the kitchen sink SRE approach where no matter what kind of dirty dish you have, you just put it in the kitchen sink. And we found that to be super effective. But it does result in a lot of toil." The reduction in toil and the ability to focus on more strategic projects not only improved team productivity but also likely contributed to improved job satisfaction and reduced burnout risk among the engineering team.

Looking Ahead

The success of this implementation has positioned KnowBe4 to efficiently scale their cloud infrastructure while maintaining high performance and availability for their customers. By embracing autonomous cloud management, KnowBe4 has not only optimized their current operations but also laid the groundwork for future growth and innovation in their security awareness training platform. The ability to automatically optimize their AI and ML workloads alongside their more traditional services has given them a competitive edge in delivering cutting-edge cybersecurity solutions.

When asked for his recommendation to teams considering autonomous optimization, Matt Duren enthusiastically stated: "Just Do It! .. we've had an incredible journey with Sedai so far. It's been short. It's been very, very easy for us to implement." This endorsement underscores the positive impact and ease of implementation that KnowBe4 experienced with Sedai's autonomous optimization platform.

No items found.

Learn more about optimizing cloud costs and saving time with autonomous operations like KnowBe4 did