Panel: Transforming Operations with AI & Autonomous Systems

This is a div block with a Webflow interaction that will be triggered when the heading is in the view.

At autocon, we hosted a panel talk on transforming operations with AI & Autonomous Systems. The panelists shared insights based on their diverse experiences, emphasizing the necessity for a strategic approach to integrating autonomous systems within existing infrastructures. Here are the top themes:

Infrastructure Readiness for Automation: Companies must evaluate their existing infrastructure to determine readiness for adopting autonomous technologies. Key factors include technological maturity and cultural alignment within the organization.
Frameworks for Automation Readiness: There is a call for standardized frameworks to assess automation readiness, though panelists noted that any such framework must be adaptable to specific industry needs and client requirements.
Incremental Implementation of Autonomy: A gradual approach to automation was favored, with suggestions to start small and scale up. This allows for effective risk management while gradually enhancing operational capabilities.
Business Case for AI in Operations: Panelists highlighted that automation is not just a cost-saving measure but also a driver for innovation, allowing organizations to redirect resources towards growth and enhance employee satisfaction.
Risk Management in Autonomous Systems: Implementing autonomous technologies involves inherent risks. The discussion stressed the importance of establishing guardrails and mitigation strategies to ensure safe and effective integration.
Future Opportunities in Autonomous Technologies: The panel looked forward to advancements in AI, especially in areas like Generative AI and MLOps, suggesting that these could transform how businesses manage data and resources, presenting new avenues for efficiency and cost optimization.

Panel Introduction and Company Scale:

The panel comprised of some of the brightest minds in IT infrastructure, including:

Tim Guleri, Managing Director of Sierra Ventures and also the Moderator of the Panel
Subha Tatavarti, the global CTO of Wipro. Wipro manages their clients’ public and private cloud environments, especially in the retail, IoT and oil and gas sectors. They serve 1,500+ customers and 25+ verticals. In terms of fleet size, they touch over a million.
Rachit Lohani leads product and technology for Paylocity, a payroll company. In terms of their technical footprint, they manage around 600-700 services, powered by 30,000 to 40,000 machines.
Shibu Raj works as a SVP IT for Geodis, a third party supply chain company out of France. He is responsible for platform modernization and platform tools for Geodis Americas from North to South, including LATAM. Shibu was also the first customer of Sedai.
Jigar Desai is working as SVP Product and Engineering for Sisu, a startup company that does data insights. Jigar comes from Facebook so he has seen millions of machines and how to manage them.
Matthew Duren (Matt) is Director Platform Engineering at KnowBe4, the provider of the world's largest security awareness training and simulated phishing platform. KnowBe4 has over 60,000 customers and runs from 2000 to 4000 containers every day, about 350 million lambda invocations, and growing every single month.
Mohamed Khalid (Mo) is the Director Enterprise Architect of GlaxoSmithKline (GSK), a leading biopharma company that makes general medicines, speciality medicines, and vaccines. It spans over 80+ countries and aims to impact the health of 2.5 billion people by the end of 2030.

Assessing Infrastructure Readiness for Autonomous Technology

Tim (Sierra Ventures): How should anybody who is just coming into the notion of autonomous be thinking about the readiness equation of their infrastructure?

How Geodis Made Itself Ready for Automation:

Shibu (Geodis): A transformation doesn't come from a vacuum. It has to be thought out end to end. When we started our journey of automation, AI processes were there, but our first question was “Are we ready to take it and reap the full benefit of it?”

We also had another equation in the bundle: Where will our money be invested? Our core business is supply chain optimization. In the supply chain, automation is a first class citizen as you have robots picking and packing. Since automation was in our DNA, we knew we had to do automation in this area because only then the customer gets the full value.

We quickly realized that we were not mature enough to adopt and get the benefit. We could invest in it because it's a newer and cooler technology but that dollar spent would be a waste.

Talking with people in Sedai helped us uncover some of the things we are not ready for. We quickly realized that we had to invest here, either by partnering with people who were already into it, or by building it ourselves. That was a quick realization for us to make sure we were ready.

How to Get Started with Automation:

Rachit (Paylocity): As humans, it's easier for us to think in terms of framework because it helps us think about what is the journey and where do we want to be.

The car industry came up with this beautiful framework: L0 to L5. It tells you where you are in terms of maturity, i.e., L0 where you have no driver assistance. L1 is where you start with assistance, get partial assistance, conditional assistance, full assistance, and then go autonomous, which will be your L5.

So what do you do? You:

Invest in technology, and embrace Infrastructure As Code (IaC).
Provide instrumentation with tools like Ansible to run commands at scale
Get partial assistance
Start observing conditions i.e., advanced observability
Embrace things like Mesh to automatically control the system
Now you have a fully automated system. The next step is going autonomous, where you leave the system and let it decide what’s best for the customers.

Mo (GSK): GSK adopted cloud. Three things were very important to us: cost, performance, and security. Taking that together and doing the right sizing is the biggest challenge we see today.

People said they can solve the problem for us and that's why we are here.

Infrastructure Readiness Across Different Companies:

Jigar (Sisu): I have a perspective I want to share and this is like my journey over three different companies.

When PayPal was going through the transformation, we had people staring at screens, worrying about every machine. We had to take some time because it was not just the technology change; it was also the cultural change where we had to get people along with us, and not just abandon them. That was my PayPal journey on how to become ready. not just from a technology perspective, but also from a people and culture perspective.

The mindset in Facebook was completely different. We were doubling in size in terms of machines, and I'm talking about millions of machines every year. So, readiness was not a word in our dictionary. You better be ready because machines are coming.

Then the very first day when we started building the system in Sisu, we had autonomy as a principle because we didn't have enough people to build a system that can be looked upon by folks standing at screens. So the autonomous system was built from the ground up with things like Sedai that you can start using on day one even as a small startup.

Subha (Wipro): The assessment of readiness is absolutely critical. As an example, one of our clients is a medical device manufacturer based out of Japan who we manage data centers in infrastructure for.

Introducing Sedai would not even be an option for us because they are all bare metal. They are sitting in their own data centers and are not even virtualized in most cases.

KnowBe4’s Readiness for Automation:

Tim (Sierra Ventures): Where was KnowBe4 on that maturation journey because I was quite impressed with how quickly you made the decision to deploy and start getting end-to-end value with Sedai.

Matt (KnowBe4): It was something that we had to be very deliberate about in order to achieve. I started at KnowBe4 in 2018. At that time, most of our software was running on EC2 instances, including databases and compute. In many cases, a single server processed and provided a lot of what we deliver to our customers. Our job as the SRE team was to clean that up while the Amazon bill was still four or five figures a month.

If we hadn't gone through that journey, it would be significantly harder now because our Amazon commit next year is millions and millions of dollars.

It was easy for us to implement Sedai because we were strong users of IaC and it only took us months to sign up with Sedai and get from 0% to 100% with them.

Framework to Test Automation Readiness:

Tim (Sierra Ventures):There's no agreed upon framework of the maturation of a company to assess if it can start adopting automation. Does the industry need a framework that quickly tests the readiness to adopt automation? If yes, whose job is it to define this framework?

Defining the Framework:

Subha (Wipro): It is hard to standardize a framework because of how fragmented the stack, the usage, the implications and the applications are. I think it has to be generic enough, but it won't then be solving the problem. It has to be coming from the customer and in consultation with somebody like Sedai, who has an understanding of how the system works.

Maturity Levels and Automation:

Rachit (Paylocity): In technology, it's less about “what” and more “how”. The how's are pretty standard as we don't have a lot of options.

When you walk into a data center and if you're naming your host with IPS or specific names, you know whether maturity is right. The next step usually is to automate this part. Once you automate that part, you graduate to the next level. That is how the framework would be agnostic to what industry you're from or the outcome you're looking for.

Autonomous System Tools:

As you go up the stack, step one usually is IaC, which is driver assist.
The next one is partial assistance. For example, if something is broken, I can run a command without understanding the system. This is where you introduce tools like Ansible or Control Tower that become the brain of your infrastructure.
The next step is around more instrumentation, where you start to gather more input and signals. You introduce tools like Datadog or more observability tool stack, so that you can drive better decisions.
Then, it is tools like Mesh, which can route your traffic by being proactive around what could go wrong.
The last step is autonomous, where tools like Sedai become a catalyst and help you jump from L3 to L5 without doing all that hard work. That is the magic of the solution.

L6 and KnowBe4’s Journey with Sedai:

Matt (KnowBe4): I think the barrier even goes back to the introduction of centralized logging and collection of data from these decentralized systems. I like the point that Sedai is an accelerant because you could get from L0 or L1 to L5 using some carefully tailored bash scripting. I almost want to introduce this idea of L6 where you have an AI-driven system that discovers things engineers or humans may have never even thought of.

I don't think that KnowBe4 is at L6 yet. In some cases, we're not even at L5. The places where we're using Sedai are much more advanced than the places where we're not. It feels like almost a new tier. That's been a cool journey for us this year, and we're looking forward to how much stuff we can get to L5 and beyond..

Shibu (Geodis): We talk about institutions that are software and technology oriented. But for example, there is no place for Ansible in a PLC or a conveyor system. I cannot bring up a conveyor system by running a script. So it depends upon the industry as well.

In every industry, there is a story for tools like Sedai. So that's where the perspective of maturity comes into play. We cannot just define that maturity or the framework by looking at a technology powerhouse like Google or eBays.

Risk Aversion and Autonomy:

Tim (Sierra Ventures): What is the right approach to implement autonomous systems? We know the benefits but how did you mitigate risk?

Autonomy Risk Management by KnowBe4:

Matt (KnowBe4):- If you're completely risk averse, you will be stuck on a lower level of autonomy. It is just a matter of taking a low risk instead of a high risk.

In our case, a lot of the building blocks were already in place. Our infrastructure was well defined by Terraform and already centralized modules. We knew 90% of our compute was being delivered by a handful of Terraform modules. That made it really easy for us to plug into that. They were also pulling the latest version of our module, so we didn't have to go through hundreds or thousands of repos and update to a new pinned version of that module. We were already taking risks by trying to be closer to the edge.

If you are looking to implement more automation, find places where you can approach the edge and implement Sedai or other tools like it. If they had problems, you could roll back quickly, tolerate a bit of an issue or down time if it were to happen.

Incremental Autonomy Implementation:

Tim (Sierra Ventures): One way to mitigate risk is by starting with 20% autonomous, and scaling your way up. I don't exactly recall, but KnowBe4 went 100% auto very quickly.

Matt (KnowBe4): We did once we were ready. You know, and we didn't start at 0 and go to 100 overnight.

We tailor picked a service that we knew would get some good utilization in production. Even as a beta feature, we had hundreds of customers testing and using this feature while we had Sedai enabled on it. Sedai was enabled throughout the entire process of building this new feature.

Even the engineers working on that service didn't know that it was happening. We moved on from there and turned it on for all our development environments after we had seen a production service go through an entire release cycle for weeks with only cost savings and no issues.

When nobody asked what happened to the service, we felt pretty confident to open the floodgates.

Tim (Sierra Ventures): How is Pharmaland thinking about the journey and the implementation?

Pharmaland's Autonomous Journey:

Mo: For us, the most important thing was realizing we couldn't achieve our goals while in a data center. So, cloud adoption became our highest priority. We adopted Azure and GCP.

We began by adopting API and IaC - everything that’s stack driven - whether it was faster drug discovery or implementing supply chain solutions. The third part was how to sell faster with market data.

Guardrails and Risk Management:

If you put appropriate guardrails, have appropriate people who can manage and operate the technology really well and understand the business, that's how you mitigate the risk. We have built guardrails. We start off with Dev environment, and move to non-prod and then production.

We still haven't gone fully auto but we hope to get there in the next couple of years.

Jigar (Sisu): At Facebook, our systems were autonomous. That means, somebody was able to push a change to the entire network and we would be disconnected from the internet for several hours. So blast radius with this level of automation is pretty high. That’s why you need to have enough guardrails and treat infrastructure code as “code”. If you are developing an application, you will not push your code to production without testing.

Non-financial Gains of Autonomy:

Tim (Sierra Ventures): What are some non-financial gains that you were able to trap?

Rachit (Paylocity): When it comes to innovation, especially in the autonomous space, we are at a precipice where we have the right tools and environment; we just need the right actors now.

Netflix and Autonomy:

We saw a similar story at Netflix around 2013, 2014, and 2015 when the culture in the industry was divided into development and operations. Development handled the build, while operations took care of deployment and infrastructure maintenance. Netflix came along and said, “This does not work for me. I want to move faster.”

So it built systems that helped people deploy more and more artifacts to production. The outcome was a 6000% increase in experimentation. They went from doing two, three, or four experiments a month to over a 1,000 experiments a day. As a result, people became hooked on Netflix. They loved Netflix not because someone really smart was sitting behind the screens figuring out what buttons to push or what movies to display, but because an autonomous system was making decisions about what could move forward and what could not.

Similarly, developers felt more comfortable rolling out pull requests (PRs). Every single PR was ready for production. If it was not ready, the system would block it and say, “Nope, you're not ready.” That was an autonomous system making decisions.

If you implement an autonomous system that helps you determine the right things to do, your customers will be happier. Your people will also be happier because they won't have to focus on mundane tasks; they can focus on more intellectually demanding and context-driven work.

On top of that, it frees up time for dependent teams. Companies that start to embrace autonomy now will see more innovation and disruption. They will be able to move faster because this is how R&D allocations work. There are companies out there with over $100 million in R&D, where 80% to 90% goes toward running the business. They are spending almost nothing on innovation. Doing so helps unlock those dollars and redirect them to actual growth, not just keeping the business alive.

Subha (Wipro): We have 250,000 employees globally, and a substantial portion of our costs goes into this employee base. In addition to the $12 million we generate from services, we have $450 million in annual recurring revenue from platforms.

To address these challenges, we had to create constraints or "starve the RTB" (Run the Business). This has led to ruthless prioritization of our RTB efforts. The savings we achieve are then reinvested into our internal core, which we refer to as our “core AI platform business”. Essentially, this is a generative AI platform we are orchestrating across multiple models, including some that are being developed by our R&D team. These models are tailored for specific tasks; for instance, some are more effective at text-to-voice conversion, while others excel in image processing.

We recently conducted a beta release with over 5,000 employees, and alongside the RTB reduction, we are also driving additional gains in other use cases, starting with HR. While this discussion may not focus on autonomy and infrastructure, it is related in principle.

For example, a significant portion of our costs was tied to background checks and hiring processes. Previously, it would take us 7 to 10 days to conduct background checks. Now, thanks to these improvements, it only takes a couple of hours. This increase in productivity not only reduces the time required to onboard new employees but also lowers the overall cost of onboarding. This is just one example of the additional savings we are achieving.

Autonomy as a Business Enabler:

Jigar (Sisu):

Automation can also serve as a business enabler. Complex business tasks can be solved using automation. At my current startup, a company in Europe said that “We need deployment in London, or we are not actually boarding on your platform.” Because we had automation, we could actually spin up a new instance just for them and serve them there.

There are many examples where your investment in automation can help grow business and not just save cost.

The L6 for Automation Technologies like Sedai:

Tim (Sierra Ventures): Give ideas on where you think technology like Sedai can go. What is that L6 you talked about?

Matt (KnowBe4): As a customer, I would love some solutions for my ever-increasing CloudFront spend, which just keeps going up every time we gain more customers. The same goes for Aurora, RDS, and S3 spend, where these unbounded or provisioned environments continue to grow as you acquire more customers.

I've defined a specific backend data store to be a specific size, and changing that means down time for my customers. Once you push the limits of the compute resources in the cloud, it becomes very difficult to manage. This is going to require more creative solutions, not all of which are immediately obvious. If you sit down and think about how you would address this with RDS, it presents a challenge.

Shibu (Geodis): The L6 would be to take tools like Sedai to the edge. i.e., A scaled-down Sedai that looks at only a few automation signals. It may require restarting some services before it happens.

For example, if a conveyor goes down, it takes two or three hours to bring it back. How can we reduce that to the edge? We just need L2, so that someone can spark the battery again and get things going. That will give us more benefit if a conveyor goes down, the entire employee base will stay put. That's a big cost.

Subha (Wipro): In use cases like Sedai and infrastructure, you need higher precision. Sedai can grow significantly by creating LLM-like or transformer-like models for the infrastructure space, depending on the kinds of data you see.

Autonomous Opportunities in LLM and GenAI:

Tim (Sierra Ventures): In an LLM and GenAI world, infra stacks are going to be rethought. Workloads are CPU-GPU hybrids. What are the autonomous opportunities in this realm?

Jigar (Sisu): There is a lot to be done. If you look at a typical GenAI lifecycle, there are three phases. One phase is data cleaning and data preparation, and I was so happy to see that Sedai is going to handle data platforms because it's a significant part of the cost, and there are a lot of optimization opportunities in just data prep space.

The second part of this is how you train the models. Whether you're using a generative AI model, such as an LLM that is available to you or open source, or you're developing your own model, training is super expensive. You are working with thousands of GPUs, or using thousands of GPUs in the cloud, which is also super expensive. The way we utilize resources for training models is probably a decade behind how we use production resources. Techniques and optimizations have not been applied to optimize GPU usage.

Then, the last phase is how you serve it. Inference is a massive cost. There is a different cost between GPT-3.5 and GPT-4. There is a significant opportunity to trim down the models so that you don’t have to serve these giant models for inference. This presents an opportunity that I generally refer to as the MLOps space, which includes everything from data preparation to training the model to serving the model. Sedai has the potential to become a billion-dollar business by addressing this new wave of developments.

Thank you for submitting your feedback.

Oops! Something went wrong while submitting the form.

Panel: Transforming Operations with AI & Autonomous Systems

Published on

October 11, 2024

Last updated on

November 28, 2024

Max 3 min

Infrastructure Readiness for Automation: Companies must evaluate their existing infrastructure to determine readiness for adopting autonomous technologies. Key factors include technological maturity and cultural alignment within the organization.
Frameworks for Automation Readiness: There is a call for standardized frameworks to assess automation readiness, though panelists noted that any such framework must be adaptable to specific industry needs and client requirements.
Incremental Implementation of Autonomy: A gradual approach to automation was favored, with suggestions to start small and scale up. This allows for effective risk management while gradually enhancing operational capabilities.
Business Case for AI in Operations: Panelists highlighted that automation is not just a cost-saving measure but also a driver for innovation, allowing organizations to redirect resources towards growth and enhance employee satisfaction.
Risk Management in Autonomous Systems: Implementing autonomous technologies involves inherent risks. The discussion stressed the importance of establishing guardrails and mitigation strategies to ensure safe and effective integration.
Future Opportunities in Autonomous Technologies: The panel looked forward to advancements in AI, especially in areas like Generative AI and MLOps, suggesting that these could transform how businesses manage data and resources, presenting new avenues for efficiency and cost optimization.