Learn how Palo Alto Networks is Transforming Platform Engineering with AI Agents. Register here

Attend a Live Product Tour to see Sedai in action.

Register now
More
Close

Autonomous DevOps: Integrating IaC with Autonomous Systems

Last updated

November 20, 2024

Published
Topics
Last updated

November 20, 2024

Published

Reduce your cloud costs by 50%, safely

  • Optimize compute, storage and data

  • Choose copilot or autopilot execution

  • Continuously improve with reinforcement learning

CONTENTS

Autonomous DevOps: Integrating IaC with Autonomous Systems

Embracing Autonomous DevOps: A New Frontier in Cloud Optimization

Organizations are constantly seeking ways to optimize their infrastructure for both cost and performance. If you're already using Infrastructure as Code (IaC) and looking to take your cloud optimization to the next level, autonomous systems offer a compelling solution.

The Promise of Autonomous DevOps

Integrating autonomous systems with your existing IaC practices can unlock significant benefits:

  • Faster realization of cost savings and performance improvements
  • Continuous optimization without manual intervention
  • Maintained integrity of your IaC system
  • Reduced burden on engineering teams

For organizations invested in IaC and aiming to optimize cloud costs, autonomous systems provide a way to achieve optimization goals with less effort than traditional cloud cost management systems, all while respecting and enhancing your existing DevOps practices. Whether you're a DevOps engineer, cloud architect, or IT leader, this article will help you understand how to seamlessly incorporate autonomous optimization into your existing workflows, ensuring your IaC remains accurate while your cloud costs are minimized. Let's explore how autonomous DevOps can transform your cloud operations and seamlessly integrate with your existing IaC practices

This article is based on a talk that I gave at Sedai’s annual autocon conference, together with Nate Singletary, Senior SRE on the Platform team at KnowBe4.  If you prefer video you can watch it below or see it and all the conference videos on the autocon event site here.

CI/CD Primer

Continuous Integration/Continuous Deployment (CI/CD) is a method to frequently deliver apps to customers by introducing automation into the application development cycle.  The key benefits include:

  • Reduced manual error
  • Decreased lead time
  • Faster release rates

What is Continuous Integration (CI)?

Continuous integration systems have promoted detecting problems much earlier in the development life cycles. This is usually done by running automated builds, unit test cases, integration tests and vulnerability scans.

What is Continuous Delivery (CD)?

Continuous delivery systems, while have streamlined the process of shipping the software into the target system in a rapid and repeatable fashion. 

Continuous integration and continuous deployment systems has been used across the industry for several years now with multiple mature products such as: 

  • Jenkins
  • Argo CD
  • Circle CI
  • GitLab CD

Together, CI and CD have effectively increased the number of software release cycles at a much faster pace, reducing the lead time for features to be shipped to customers.

Infrastructure as Code (IaC) Primer

One of the key aspects of CD systems has been infrastructure as code (IaC).  IaC is the management of infrastructure (networks, virtual machines, container workloads/deployments, ingress, etc.) in a descriptive model.  This makes infrastructure declarative, reproducible and auditable.

The core focus is the infrastructure. The state of the required infrastructure (What It Should Be  or “WISB”) is defined as code, without specifying or imperatively defining how the hardware is realized in a manual configuration management process.

This approach enables the benefits that application developers get from code repositories such as auditing, versioning, collaborating to be extended and made available to hardware systems. This includes creating a repeatable form of generating hardware (including public cloud infrastructure) i.e., hardware is now reproducible.

Some popular IaC tools include:

  • Terraform
  • Ansible
  • AWS Cloud Development Kit (CDK)
  • Puppet
  • Helm

GitOps and Infrastructure Management Primer

Now the glue that brings together the infrastructure as a code and the realized hardware is GitOps, including Git repositories.

Let‘s look at an example GitOps workflow which uses CI/CD to ship a software release into a containerized application. 

Here are the steps involved in this workflow:

  1. The process starts with a developer pushing software code into the source code repository. 
  2. Those changes are picked up by Jenkins, which is actually the CI system in this case. Jenkins runs builds and unit test cases and if everything goes well, it creates a docker image and pushes those changes into the Docker registry.
  3. So the next thing it does is it actually updates another source code repository, this time for infrastructure. It updates the new software release, which has been just created in the previous step.
  4. As soon as the source code repository for the infrastructure is changed, the change is picked up by the CD system, in this case Argo CD, and it pushes it out into the Kubernetes clusters. 
  5. Kubernetes clusters pick up the updated Docker image from the Docker registry which has been created in the previous step.

Autonomous System Primer

Now before we look at how IaC and autonomous systems fit together, let's explain the capabilities of an autonomous system vs traditional automated systems.

The key advantage of an autonomous system over an automated system is that it can manage a larger set of tasks for the user.  At a high level it can detect, recommend, validate and execute changes as shown below.

At a more detailed level, an autonomous system can:

  • Automatically detect topology
  • Read metrics from the monitoring providers and correlate the metrics to the topology
  • Build a model, which takes into account seasonality, traffic, the resource usage, the outcome, the output of those applications. 
  • Understand the opportunities are available 
  • Detect problem signals and generate solutions to those problems
  • Work out how to act on those opportunities
  • Apply those changes in production
  • Look at the efficacy of the changes that has been made
  • Learn from those changes, both those by an autonomous system and external systems. 
  • Continues this whole process with continuous optimization and availability fixes running 24/7

Why Autonomous Systems Need to Integrate with IaC

The crux of the issue is that certain attributes of infrastructure need to be handled by an autonomous system to be effective and that includes attributes which are already defined in a CD system.

Let's take an example of a web application and its runtime attributes:

  1. The amount of memory that is required, the CPU that is required, and the number of replicas that is required for that web application could come from the dev team. This makes sense as the application is being developed as they probably have the most insights into how that application works.
  2. But we all know that in production, things could be a lot different. A lot of companies try to establish some kind of a performance testing system to get as close to a production environment. Many factors can produce a situation in which the optimal production settings are different than what was initially selected in dev/test.  These could be changing traffic, seasonality, dependent applications or underlying infrastructure.  And the advanced state of CI/CD systems mean new features come into production faster.

So that means the nature of the applications itself is changing.  To keep up, you need a system that is capable of continuous optimization as it’s hard for any human or any automated system to handle the volume and complexity of work needed.

You may have already heard about shift left or shift right strategies. In shift left, you want things to be solved earlier in the development cycle.  But we suggest that instead of shifting an activity to a team on the left or to the right, it should be handled by an autonomous system. At Sedai we call this a “shift up” strategy.

We recommend not fixing those runtime attributes permanently to the initial IaC values, as they are better handled by an autonomous system. 

IaC and Autoscaling as an analogy for IaC and Autonomous Systems

This approach is not too different from, say, an application which does not have a horizontal autoscaler defined, and one which has an autoscaler. 

If a workload doesn't have an autoscaler, then the number of replicas which is required to handle probably the worst case scenario (i.e,. Resources needed to handle peak traffic), would be defined in IaC. 

But autoscalers can bring major efficiencies.  In the autoscaler model, the number of replicas is determined at the runtime, and not defined in IaC. With autonomous systems, the situation is similar. Workload attributes that can be set more efficiently in runtime are left to the autonomous system.

Four Strategies for Integrating IaC with Autonomous Systems

From our work with customers, we have found there is no one size fit all approach.  We have come up with four strategies on how autonomous systems such as Sedai can be integrated with IaC systems:

  1. Autonomous System Manages Resource Allocation (Recommended) 
  2. Autonomous System Sync
  3. Integrate Sedai’s WISB (what it should be) in IaC
  4. Integrate with IaC Repository 

Option One: Autonomous System Manages Resource Allocation

An autonomous system manages everything during runtime. That means you only provide hints as to what an autonomous system should do, but not any specific runtime attributes at all.

Option Two: Autonomous System Sync

When we designed Sedai, we always had it in mind that Sedai is not the only system which will be managing runtime configuration. There could be so many reasons that another system could be making changes.  So if such a thing happens, Sedai should adapt to these situations,  re-learning from changes made and continuously optimize resources.

With this extra setting, we are giving a hint to Sedai that there is a CD system, which is not getting inputs on what Sedai is doing, which means it could get overwritten. In those cases, Sedai will actually try to reapply those configurations if it is safe to do so. If not, it'll just go with its regular learning process and continuous optimization.

Option Three: Integrate Sedai’s WISB in IaC

In this model, Sedai’s “What It Should Be”(WISB) recommendations are pulled into the CD system via some additional scripting put in place. This can be either a pull or push-based option in which you can programmatically request the recommended configuration for any given resource from Sedai at a given point of time. You pull the information and put it into the CD system.

Option Four: Integrate with IaC Repository 

In this option, Sedai directly integrates with your IaC repository.

In this model, Sedai needs additional information about your Git repository. Sedai supports a lot of standard formats like Helm, JSON, YAML, Kubernetes, deployment specs, Terraform templates. With all this additional information, every time Sedai makes a change directly into the cloud provider using the APIs, it will also create a PR (pull request) with exactly those changes.  This PR can be reviewed by the customer and merged back into the Git repository.

Case Study of Integrating IaC with Autonomous Systems at KnowBe4

KnowBe4 is one of the customers that has helped us streamline some of these integration options.  KnowBe4 provides the world's largest security awareness training and simulated phishing platform used by more than 34,000 organizations globally.

KnowBe4 has a diverse suite of products ranging from our security awareness tools to our human detection risk tools, from Security Orchestration tools that can identify malicious emails and remove them from your inbox and turn those attacks into real world training opportunities for users.  These applications are spread across many cloud services comprising thousands of microservices and functions and data stores, all deployed in AWS.

KnowBe4 Product Set and AWS Technologies

KnowBe4 Platform Architecture

KnowBe4 uses Gitlab for CI/CD, deploying and commiting all of their code there.

KnowBe4's Platform Architecture

KnowBe4’s production workloads are deployed in AWS, while Datadog is used for monitoring. 

KnowBe4 adopted an autonomous system, Sedai, to handle the right autonomous rightsizing of their workloads for their ECS tasks and Lambda functions.  As part of this implementation, KnowBe4 needed to do some work to integrate Sedai with their CICD workflow.

They wanted to do it in a way that didn't shift configurations left, so that developers could stay focused on new products. 

Below is KnowBe4’s CI/CD workflow for ECS. They commit code in GitLab that may trigger a build job in GitLab that deploys that ECS image into ECR.  Infrastructure code then gets deployed in AWS.

KnowBe4's Gitlab based CI/CD Process Integrating Sedai

At the same time, while KnowBe4 is continuously releasing and deploying, KnowBe4 also has Sedai working and analyzing our workloads and optimizing them.
Before the full integration with IaC, KniwBe4 ran into situations where there could have been conflicts as Sedai was continuously optimizing the workloads and updating configs, and at the same time KnowBe4 was continuously pushing code.  The question was how does KnowBe4 not overwrite the configs that Sedai updated?

What KnowBe4 did was pull the current config after Sedai makes a change, and use that at the time of the deployment.  But in that case, the IaC is no longer the source of truth for configs.

Integration list inside Sedai showing Knowbe4's Gitlab Integration

So now KnowBe4 is integrating Sedai with GitLab. KnowBe4 gave Sedai access to their GitLab instance, allowing Sedai to come back in and update the IaC once Sedai made an optimization.  Sedai generates merge requests and that config now goes back into IaC (see example below).

Example Merge Request to Update IaC Configurations for Downscaled CPU and Memory

And KnowBe4 doesn't need to divert the attention of their application development engineer to the release process to have to manually update the configurations to the latest, optimized values.  Based on this approach, KnowBe4 now has a full autonomous feedback loop in which improvements are identified as the application runs into production, and those changes are reflected in both the IaC system and the production application itself.

KnowBe4 has also realized impressive results from their deployment of Sedai.  Over 9,500 services are connected to Sedai, 98% being optimized autonomously. KnowBe4 has had over 1,100 plus autonomous actions in three months.  KnowBe4 is trending towards 27% cost reduction, and already achieved 10% savings.

KnowBe4 Initial Optimization Results

Take the Next Step: Experience Autonomous DevOps in Action

Ready to see how autonomous systems can work with your IaC setup? Sedai offers a personalized demo tailored to your specific environment based on our experience working with CI/CD and IaC systems including Gitlab, Github and Terraform at companies like Palo Alto Networks, Experian and HP. Our experts will show you how to achieve faster optimization with less effort, all while maintaining the integrity of your IaC. Don't let manual cloud optimization hold you back. Schedule your demo with Sedai today and take the first step towards autonomous DevOps.

Was this content helpful?

Thank you for submitting your feedback.
Oops! Something went wrong while submitting the form.

CONTENTS

Autonomous DevOps: Integrating IaC with Autonomous Systems

Published on
Last updated on

November 20, 2024

Max 3 min
Autonomous DevOps: Integrating IaC with Autonomous Systems

Embracing Autonomous DevOps: A New Frontier in Cloud Optimization

Organizations are constantly seeking ways to optimize their infrastructure for both cost and performance. If you're already using Infrastructure as Code (IaC) and looking to take your cloud optimization to the next level, autonomous systems offer a compelling solution.

The Promise of Autonomous DevOps

Integrating autonomous systems with your existing IaC practices can unlock significant benefits:

  • Faster realization of cost savings and performance improvements
  • Continuous optimization without manual intervention
  • Maintained integrity of your IaC system
  • Reduced burden on engineering teams

For organizations invested in IaC and aiming to optimize cloud costs, autonomous systems provide a way to achieve optimization goals with less effort than traditional cloud cost management systems, all while respecting and enhancing your existing DevOps practices. Whether you're a DevOps engineer, cloud architect, or IT leader, this article will help you understand how to seamlessly incorporate autonomous optimization into your existing workflows, ensuring your IaC remains accurate while your cloud costs are minimized. Let's explore how autonomous DevOps can transform your cloud operations and seamlessly integrate with your existing IaC practices

This article is based on a talk that I gave at Sedai’s annual autocon conference, together with Nate Singletary, Senior SRE on the Platform team at KnowBe4.  If you prefer video you can watch it below or see it and all the conference videos on the autocon event site here.

CI/CD Primer

Continuous Integration/Continuous Deployment (CI/CD) is a method to frequently deliver apps to customers by introducing automation into the application development cycle.  The key benefits include:

  • Reduced manual error
  • Decreased lead time
  • Faster release rates

What is Continuous Integration (CI)?

Continuous integration systems have promoted detecting problems much earlier in the development life cycles. This is usually done by running automated builds, unit test cases, integration tests and vulnerability scans.

What is Continuous Delivery (CD)?

Continuous delivery systems, while have streamlined the process of shipping the software into the target system in a rapid and repeatable fashion. 

Continuous integration and continuous deployment systems has been used across the industry for several years now with multiple mature products such as: 

  • Jenkins
  • Argo CD
  • Circle CI
  • GitLab CD

Together, CI and CD have effectively increased the number of software release cycles at a much faster pace, reducing the lead time for features to be shipped to customers.

Infrastructure as Code (IaC) Primer

One of the key aspects of CD systems has been infrastructure as code (IaC).  IaC is the management of infrastructure (networks, virtual machines, container workloads/deployments, ingress, etc.) in a descriptive model.  This makes infrastructure declarative, reproducible and auditable.

The core focus is the infrastructure. The state of the required infrastructure (What It Should Be  or “WISB”) is defined as code, without specifying or imperatively defining how the hardware is realized in a manual configuration management process.

This approach enables the benefits that application developers get from code repositories such as auditing, versioning, collaborating to be extended and made available to hardware systems. This includes creating a repeatable form of generating hardware (including public cloud infrastructure) i.e., hardware is now reproducible.

Some popular IaC tools include:

  • Terraform
  • Ansible
  • AWS Cloud Development Kit (CDK)
  • Puppet
  • Helm

GitOps and Infrastructure Management Primer

Now the glue that brings together the infrastructure as a code and the realized hardware is GitOps, including Git repositories.

Let‘s look at an example GitOps workflow which uses CI/CD to ship a software release into a containerized application. 

Here are the steps involved in this workflow:

  1. The process starts with a developer pushing software code into the source code repository. 
  2. Those changes are picked up by Jenkins, which is actually the CI system in this case. Jenkins runs builds and unit test cases and if everything goes well, it creates a docker image and pushes those changes into the Docker registry.
  3. So the next thing it does is it actually updates another source code repository, this time for infrastructure. It updates the new software release, which has been just created in the previous step.
  4. As soon as the source code repository for the infrastructure is changed, the change is picked up by the CD system, in this case Argo CD, and it pushes it out into the Kubernetes clusters. 
  5. Kubernetes clusters pick up the updated Docker image from the Docker registry which has been created in the previous step.

Autonomous System Primer

Now before we look at how IaC and autonomous systems fit together, let's explain the capabilities of an autonomous system vs traditional automated systems.

The key advantage of an autonomous system over an automated system is that it can manage a larger set of tasks for the user.  At a high level it can detect, recommend, validate and execute changes as shown below.

At a more detailed level, an autonomous system can:

  • Automatically detect topology
  • Read metrics from the monitoring providers and correlate the metrics to the topology
  • Build a model, which takes into account seasonality, traffic, the resource usage, the outcome, the output of those applications. 
  • Understand the opportunities are available 
  • Detect problem signals and generate solutions to those problems
  • Work out how to act on those opportunities
  • Apply those changes in production
  • Look at the efficacy of the changes that has been made
  • Learn from those changes, both those by an autonomous system and external systems. 
  • Continues this whole process with continuous optimization and availability fixes running 24/7

Why Autonomous Systems Need to Integrate with IaC

The crux of the issue is that certain attributes of infrastructure need to be handled by an autonomous system to be effective and that includes attributes which are already defined in a CD system.

Let's take an example of a web application and its runtime attributes:

  1. The amount of memory that is required, the CPU that is required, and the number of replicas that is required for that web application could come from the dev team. This makes sense as the application is being developed as they probably have the most insights into how that application works.
  2. But we all know that in production, things could be a lot different. A lot of companies try to establish some kind of a performance testing system to get as close to a production environment. Many factors can produce a situation in which the optimal production settings are different than what was initially selected in dev/test.  These could be changing traffic, seasonality, dependent applications or underlying infrastructure.  And the advanced state of CI/CD systems mean new features come into production faster.

So that means the nature of the applications itself is changing.  To keep up, you need a system that is capable of continuous optimization as it’s hard for any human or any automated system to handle the volume and complexity of work needed.

You may have already heard about shift left or shift right strategies. In shift left, you want things to be solved earlier in the development cycle.  But we suggest that instead of shifting an activity to a team on the left or to the right, it should be handled by an autonomous system. At Sedai we call this a “shift up” strategy.

We recommend not fixing those runtime attributes permanently to the initial IaC values, as they are better handled by an autonomous system. 

IaC and Autoscaling as an analogy for IaC and Autonomous Systems

This approach is not too different from, say, an application which does not have a horizontal autoscaler defined, and one which has an autoscaler. 

If a workload doesn't have an autoscaler, then the number of replicas which is required to handle probably the worst case scenario (i.e,. Resources needed to handle peak traffic), would be defined in IaC. 

But autoscalers can bring major efficiencies.  In the autoscaler model, the number of replicas is determined at the runtime, and not defined in IaC. With autonomous systems, the situation is similar. Workload attributes that can be set more efficiently in runtime are left to the autonomous system.

Four Strategies for Integrating IaC with Autonomous Systems

From our work with customers, we have found there is no one size fit all approach.  We have come up with four strategies on how autonomous systems such as Sedai can be integrated with IaC systems:

  1. Autonomous System Manages Resource Allocation (Recommended) 
  2. Autonomous System Sync
  3. Integrate Sedai’s WISB (what it should be) in IaC
  4. Integrate with IaC Repository 

Option One: Autonomous System Manages Resource Allocation

An autonomous system manages everything during runtime. That means you only provide hints as to what an autonomous system should do, but not any specific runtime attributes at all.

Option Two: Autonomous System Sync

When we designed Sedai, we always had it in mind that Sedai is not the only system which will be managing runtime configuration. There could be so many reasons that another system could be making changes.  So if such a thing happens, Sedai should adapt to these situations,  re-learning from changes made and continuously optimize resources.

With this extra setting, we are giving a hint to Sedai that there is a CD system, which is not getting inputs on what Sedai is doing, which means it could get overwritten. In those cases, Sedai will actually try to reapply those configurations if it is safe to do so. If not, it'll just go with its regular learning process and continuous optimization.

Option Three: Integrate Sedai’s WISB in IaC

In this model, Sedai’s “What It Should Be”(WISB) recommendations are pulled into the CD system via some additional scripting put in place. This can be either a pull or push-based option in which you can programmatically request the recommended configuration for any given resource from Sedai at a given point of time. You pull the information and put it into the CD system.

Option Four: Integrate with IaC Repository 

In this option, Sedai directly integrates with your IaC repository.

In this model, Sedai needs additional information about your Git repository. Sedai supports a lot of standard formats like Helm, JSON, YAML, Kubernetes, deployment specs, Terraform templates. With all this additional information, every time Sedai makes a change directly into the cloud provider using the APIs, it will also create a PR (pull request) with exactly those changes.  This PR can be reviewed by the customer and merged back into the Git repository.

Case Study of Integrating IaC with Autonomous Systems at KnowBe4

KnowBe4 is one of the customers that has helped us streamline some of these integration options.  KnowBe4 provides the world's largest security awareness training and simulated phishing platform used by more than 34,000 organizations globally.

KnowBe4 has a diverse suite of products ranging from our security awareness tools to our human detection risk tools, from Security Orchestration tools that can identify malicious emails and remove them from your inbox and turn those attacks into real world training opportunities for users.  These applications are spread across many cloud services comprising thousands of microservices and functions and data stores, all deployed in AWS.

KnowBe4 Product Set and AWS Technologies

KnowBe4 Platform Architecture

KnowBe4 uses Gitlab for CI/CD, deploying and commiting all of their code there.

KnowBe4's Platform Architecture

KnowBe4’s production workloads are deployed in AWS, while Datadog is used for monitoring. 

KnowBe4 adopted an autonomous system, Sedai, to handle the right autonomous rightsizing of their workloads for their ECS tasks and Lambda functions.  As part of this implementation, KnowBe4 needed to do some work to integrate Sedai with their CICD workflow.

They wanted to do it in a way that didn't shift configurations left, so that developers could stay focused on new products. 

Below is KnowBe4’s CI/CD workflow for ECS. They commit code in GitLab that may trigger a build job in GitLab that deploys that ECS image into ECR.  Infrastructure code then gets deployed in AWS.

KnowBe4's Gitlab based CI/CD Process Integrating Sedai

At the same time, while KnowBe4 is continuously releasing and deploying, KnowBe4 also has Sedai working and analyzing our workloads and optimizing them.
Before the full integration with IaC, KniwBe4 ran into situations where there could have been conflicts as Sedai was continuously optimizing the workloads and updating configs, and at the same time KnowBe4 was continuously pushing code.  The question was how does KnowBe4 not overwrite the configs that Sedai updated?

What KnowBe4 did was pull the current config after Sedai makes a change, and use that at the time of the deployment.  But in that case, the IaC is no longer the source of truth for configs.

Integration list inside Sedai showing Knowbe4's Gitlab Integration

So now KnowBe4 is integrating Sedai with GitLab. KnowBe4 gave Sedai access to their GitLab instance, allowing Sedai to come back in and update the IaC once Sedai made an optimization.  Sedai generates merge requests and that config now goes back into IaC (see example below).

Example Merge Request to Update IaC Configurations for Downscaled CPU and Memory

And KnowBe4 doesn't need to divert the attention of their application development engineer to the release process to have to manually update the configurations to the latest, optimized values.  Based on this approach, KnowBe4 now has a full autonomous feedback loop in which improvements are identified as the application runs into production, and those changes are reflected in both the IaC system and the production application itself.

KnowBe4 has also realized impressive results from their deployment of Sedai.  Over 9,500 services are connected to Sedai, 98% being optimized autonomously. KnowBe4 has had over 1,100 plus autonomous actions in three months.  KnowBe4 is trending towards 27% cost reduction, and already achieved 10% savings.

KnowBe4 Initial Optimization Results

Take the Next Step: Experience Autonomous DevOps in Action

Ready to see how autonomous systems can work with your IaC setup? Sedai offers a personalized demo tailored to your specific environment based on our experience working with CI/CD and IaC systems including Gitlab, Github and Terraform at companies like Palo Alto Networks, Experian and HP. Our experts will show you how to achieve faster optimization with less effort, all while maintaining the integrity of your IaC. Don't let manual cloud optimization hold you back. Schedule your demo with Sedai today and take the first step towards autonomous DevOps.

Was this content helpful?

Thank you for submitting your feedback.
Oops! Something went wrong while submitting the form.