November 28, 2024
October 14, 2024
November 28, 2024
October 14, 2024
Optimize compute, storage and data
Choose copilot or autopilot execution
Continuously improve with reinforcement learning
At Autocon, we hosted a panel talk on “Optimizing Architecture for AI & Autonomous.” The discussion centered around the architectural challenges and considerations necessary for achieving autonomy in systems, particularly in the context of AI and data management. The participants shared insights into the implications of these challenges on business processes and the future of automation.
The panel talk can be watched here: https://www.sedai.io/autocon23?wchannelid=gq0hfympq3&wmediaid=75dkaw6jal
The panel comprised of some of the brightest minds in IT infrastructure, including:
The discussion started narrow and went broad. Lets dive right into the insights from the industry leaders.
Salil (Uncorrelated Ventures): What are some bad architectural ideas that would prevent an app from being autonomously managed?
Shridhar (AWS): Whether you're doing Lambda or containers, we recommend the idea of code-full versus service-full. Do not use function code to express state because this would be a nightmare to operate. Every time something goes wrong, you will have to look at the code first.
Salil (Uncorrelated Ventures): Where do you put the state machines?
Shridhar (AWS): In the serverless world, you could use step functions. You will also get this liberty to have a single purpose lambda function - the smallest unit of compute you can express. A whole bunch of them can be chained, using a bunch of state machines ,connected through events in whatever design you prefer. It is very granular to manage and automate because if something goes wrong in this case, you know the exact unit where that’s faulty. The monitoring model would have an idea and will get rid of it, substitute it, fallback or whatever redundancy mechanism is in place.
Whereas, a state is not as simple. I don't wanna say monolith, but there's an expression folks use. They call them fat lambdas, where an entire business logic is written and can be broken down into a series of steps. All of that is dumped in one single place.
Both generally don't scale well.
Sanjay (First American): When people devise systems, they don't think about what happens if the system fails.
For example, if you drive a Tesla, you have to be at the steering wheel to take control in case something goes wrong. You have to think about computer systems the same way. I see a lot of risk associated with things going wrong because we haven't reached that level of maturity yet, where we can fully trust the systems.
So, always think of an alternative when you devise an autonomous system in case the systems fail.
Ilan (Ex-Datadog): I think most Tesla related crashes are when the human decided that they knew better than the autonomous system and tried to turn the wheel. I can say from firsthand experience that the only time I've been in a Tesla that crashed was when I thought I knew better than the computer and I was wrong.
When you give autonomous systems very clear guardrails, and you tell them they need to do only one thing well, i.e., staying in their lane, they'll do it very well.
Venkat (Talkdesk): Lack of monitoring, telemetry, and compliance security, can lead to a Tesla-like crash.
Ilan (Ex-Datadog): We've now hit a pro point in the world where even if you're on premise, there's likely an API to get compute, storage, and data store, But we still see buildings that can’t be automated because there are teams who say “We'll make an API for it later.” How do you automate a thing that has no API to it?
I don't think everything should be automated or needs to be automated. But if you want it to work autonomously, you have to have an API to it.
Vikram (Astronomer): Don't make a decision in code if you expect it to be automated. If you want something to be autonomous, you need to say what decision it will make, and also track what decisions were made over time. That way, you can monitor if it is happening correctly or not?
Secondly, when I say “decision”, the most common mistake I've seen people make is expressing them as being binary when they are rarely binary.
When you are trying to make a system autonomous, there’s a probability element to it. For example, at a certain point of time, you'd want to say, “Hey, I want to scale up”. That decision is purely binary.
You actually wanted to say, “Hey, based on the criteria that I want to establish over time, I'd want to scale it up sooner rather than later”, versus “Actually, I don't want to scale it up for cost reasons”.
Salil (Uncorrelated Ventures): Astronomer is the commercial entity behind the Apache Airflow project. Vikram (Astronomer), what type of workloads Airflow handles? Are these data movement orchestration type workloads? Those are not easy to make autonomous. Right? So wouldn't that be one answer?
Vikram (Astronomer): They are not easy to make autonomous for two reasons. First, most scheduling systems in general, including the Linux scheduler, are based on preemptive scheduling, which means you can interrupt tasks.
Even though there should be an item put in and they should be restartable, data movement systems do not always map to that. If the tasks are small and important, we can do them. However, there are times when we must honor the limitations and admit that we can’t do them. These are the examples of when we struggle to make those tasks autonomous.
Ilan (Ex-Datadog): If you have a job that takes ten hours to do because it is all one-step, you cannot automate that. But if you have a bunch of small jobs that come together to make an output, you can automate each and every single one of those individually.
Vikram (Astronomer): Yes, it is, and we can actually scale them up. Use a lot of micro pipelines rather than macro monolithic systems, which are much harder to make autonomous.
Salil (Uncorrelated Ventures): If an app has any state at all,you will have to deal with the state to make it autonomous, and it will probably be impossible to make it autonomous. Does that mean all apps should be stateless?
Vikram (Astronomer): Stateless apps are a lot easier to make autonomous. I think stateful apps can’t be made autonomous. We've had database transactions for decades now. There are checkpointing and robots. But it needs to be deliberate as opposed to accidental.
Salil (Uncorrelated Ventures): You can't handle application state the way you handle database state with all the checkpointing. There is a database inside it and you will be managing a database inside the app.
Vikram (Astronomer): Not necessarily. A lot of data transformation systems have checkpoints, and you would revert back to a last known checkpoint. It is basically a state, but you can say, “Hey, at what point do I checkpoint this?”
Salil (Uncorrelated Ventures): But then something like Sedai will have to be aware of it to be able to go back to that.
Vikram (Astronomer): It is hard, but that's different incarnations of autonomous frameworks for different layers of the stack that we want to manage. I'm not talking about a particular implementation like Sedai. I'm talking about general autonomy as a principle, like in computing.
Ilan (Ex-Datadog): The best app is one word of the state. You (Salil) have made the state somebody else's problem. That doesn’t mean you can't have any state, but that every part of your microservice architecture is managing its own state. Have the state in as few places as you can so you don't have to fight that problem.
To your (Vikram’s) point, if we can get those systems to return something via an API to give us ideas on where it is in a transaction and if we can take an action on it, then we can make that autonomous. But that means you have to have state systems that are well-architected and well-designed, that expose us to the right information, which you will never be able to do if every single microservice is managing state.
Salil (Uncorrelated Ventures): Isn't the simple solution to make state somebody else's problem and use a cloud data service to store your data and make everything you do stateless.
Shridhar (AWS): That's where I started from. Don't go into building APIs just to get state and so on. Let services do it for you.
Salil (Uncorrelated Ventures): The nature of workloads is going to change completely in the future. Do you want to say more about that?
Sanjay (First American): The autonomous systems are becoming much smarter. Previously, we had to train models, and based on the information we provided, it'll carry out whatever you ask it to do. But now the models are unsupervised. They can absorb the world's data, and do reasoning. That completely changes how we think about building enterprise systems. Given that, I see the potential of change that can happen in every industry.
For example, if you take the bond market and the stock market combined, the real estate industry alone is bigger than both of them. But it is so difficult to transact real estate. It still takes thirty days to buy a house when you can buy stocks at your fingertips. Why? Because of the process that runs these companies.
In an ideal world, if everything is automated, the business architecture of any company can potentially change dramatically.
Venkat (Talkdesk): It's not like the industries don't want to move faster. It's just about compliance and safety. While automation can accelerate, a lot of the work in these industries is decision making. Some of these industries will be impaired until the regulations come in at the right level with technology.
Ilan (Ex-Datadog): People say, “Are you worried about losing jobs to autonomous systems?” The answer is always “No”, we will get to work on more interesting projects.” But there are industries where I don't think that will be the case.
It is a different story in engineering because we don't have enough staff there. But let's say your entire job is to look through the history and ensure that nobody else has a claim on this house. If I automate you out of a job, what more interesting thing you can get to do as a paper pusher? Some of the regulations exist because people need jobs in the physical world.
Sanjay (First American): To automate the physical industries, a lot more Sedai’s will be required. The goal is not to build one big system or an automatic system. It is to build small components and stitch them together to run autonomously. That can be done for any industry.
Vikram (Astronomer): I would agree with that.
Ilan (Ex-Datadog): It comes down to standardization. Each time you make a standard in the decision tree, that standard becomes automatable.
Salil: I want to ask about Talkdesk’s journey of automation.
Venkat (Talkdesk): Think of a three by three matrix; Y-axis being AI and X-axis being a human. Easy-to-Hard on humans is along the Y-axis. The easiest for AI, but the hardest for humans is probably the magic quadrant. Everything along that X-axis will be your area for automation because those are low hanging fruits as this is where a lot of volume work is done.
For example, let's pick up a bank sending out a Non-Sufficient Notice. The bank already knows that you don't have money in the account, and your car loan is supposed to be paid in the next two days. They could have easily sent you a SMS to remind you that you're going to be out of funds. But they can't physically do it today because of the amount of human engagement needed to write complex queries, search multiple things, and then finally send out a notice.
This is the area where we can have autonomous systems come in very quickly and do things which are hard for humans but much easier for AI. That's the kind of philosophy we use. We don't do it just at the application level but also at an infrastructural level.
Shridhar (AWS): I'm glad to hear people index so much on observability, because that's what I run at Lambda. Getting rid of undifferentiated heavy lifting is perhaps one of the first steps towards getting to this world where we want things to be autonomous.
If someone has to do a whole bunch of things every single time they have to stand up a stack, that setup is the furthest away from being autonomous.
Vikram (Astronomer): I was hoping we could have a spirited discussion about centralized, decentralized or federated systems.
Venkat (Talkdesk): Lack of automation centralizes the architecture, which will lead to challenges, and you will not be able to make anything out of numbers.
Sanjay (First American): There are a lot of traditional industries that built their systems decades ago. To make any change, you have to interact with these systems and it becomes very hard to introduce any new thing.
If there was a centralized system which understood a company’s business really well, after having consumed all the data, deeds, trust, mortgages, etc. that exist in the world, then all you have to build is very light food systems, which are targeted toward your internal employees. Your employees would only need to ask the right questions to this central autonomous system, and it will give the answers. That's the kind of the feature stack I think will emerge in the traditional industries, if not in the tech industry.
Shridhar (AWS): But aren't you introducing a single point of failure in that case? If you have to scale in size, functionality, or technology, you will first have to make sure that the central system understands what's going to happen. Otherwise, you just siloed a thing that probably exists, but nobody knows about.
Ilan (Ex-Datadog): Right now, the challenge is that everything is distributed across a hundred systems. To change some business logic, you will have to go change a hundred systems. On the other hand, we've spent the last decade trying to get away from these crazy monoliths where everything's in a big ball of code that we can't understand and deploy because it might break a bunch of decisions we're trying to make. The answer is not federated versus centralized. You just have to make the right decision at the right place in your organization.
Salil (Uncorrelated Ventures): We should never put any configuration in code or ,we should forget about infrastructure as code. We should just let autonomous systems drive the configuration.
October 14, 2024
November 28, 2024
At Autocon, we hosted a panel talk on “Optimizing Architecture for AI & Autonomous.” The discussion centered around the architectural challenges and considerations necessary for achieving autonomy in systems, particularly in the context of AI and data management. The participants shared insights into the implications of these challenges on business processes and the future of automation.
The panel talk can be watched here: https://www.sedai.io/autocon23?wchannelid=gq0hfympq3&wmediaid=75dkaw6jal
The panel comprised of some of the brightest minds in IT infrastructure, including:
The discussion started narrow and went broad. Lets dive right into the insights from the industry leaders.
Salil (Uncorrelated Ventures): What are some bad architectural ideas that would prevent an app from being autonomously managed?
Shridhar (AWS): Whether you're doing Lambda or containers, we recommend the idea of code-full versus service-full. Do not use function code to express state because this would be a nightmare to operate. Every time something goes wrong, you will have to look at the code first.
Salil (Uncorrelated Ventures): Where do you put the state machines?
Shridhar (AWS): In the serverless world, you could use step functions. You will also get this liberty to have a single purpose lambda function - the smallest unit of compute you can express. A whole bunch of them can be chained, using a bunch of state machines ,connected through events in whatever design you prefer. It is very granular to manage and automate because if something goes wrong in this case, you know the exact unit where that’s faulty. The monitoring model would have an idea and will get rid of it, substitute it, fallback or whatever redundancy mechanism is in place.
Whereas, a state is not as simple. I don't wanna say monolith, but there's an expression folks use. They call them fat lambdas, where an entire business logic is written and can be broken down into a series of steps. All of that is dumped in one single place.
Both generally don't scale well.
Sanjay (First American): When people devise systems, they don't think about what happens if the system fails.
For example, if you drive a Tesla, you have to be at the steering wheel to take control in case something goes wrong. You have to think about computer systems the same way. I see a lot of risk associated with things going wrong because we haven't reached that level of maturity yet, where we can fully trust the systems.
So, always think of an alternative when you devise an autonomous system in case the systems fail.
Ilan (Ex-Datadog): I think most Tesla related crashes are when the human decided that they knew better than the autonomous system and tried to turn the wheel. I can say from firsthand experience that the only time I've been in a Tesla that crashed was when I thought I knew better than the computer and I was wrong.
When you give autonomous systems very clear guardrails, and you tell them they need to do only one thing well, i.e., staying in their lane, they'll do it very well.
Venkat (Talkdesk): Lack of monitoring, telemetry, and compliance security, can lead to a Tesla-like crash.
Ilan (Ex-Datadog): We've now hit a pro point in the world where even if you're on premise, there's likely an API to get compute, storage, and data store, But we still see buildings that can’t be automated because there are teams who say “We'll make an API for it later.” How do you automate a thing that has no API to it?
I don't think everything should be automated or needs to be automated. But if you want it to work autonomously, you have to have an API to it.
Vikram (Astronomer): Don't make a decision in code if you expect it to be automated. If you want something to be autonomous, you need to say what decision it will make, and also track what decisions were made over time. That way, you can monitor if it is happening correctly or not?
Secondly, when I say “decision”, the most common mistake I've seen people make is expressing them as being binary when they are rarely binary.
When you are trying to make a system autonomous, there’s a probability element to it. For example, at a certain point of time, you'd want to say, “Hey, I want to scale up”. That decision is purely binary.
You actually wanted to say, “Hey, based on the criteria that I want to establish over time, I'd want to scale it up sooner rather than later”, versus “Actually, I don't want to scale it up for cost reasons”.
Salil (Uncorrelated Ventures): Astronomer is the commercial entity behind the Apache Airflow project. Vikram (Astronomer), what type of workloads Airflow handles? Are these data movement orchestration type workloads? Those are not easy to make autonomous. Right? So wouldn't that be one answer?
Vikram (Astronomer): They are not easy to make autonomous for two reasons. First, most scheduling systems in general, including the Linux scheduler, are based on preemptive scheduling, which means you can interrupt tasks.
Even though there should be an item put in and they should be restartable, data movement systems do not always map to that. If the tasks are small and important, we can do them. However, there are times when we must honor the limitations and admit that we can’t do them. These are the examples of when we struggle to make those tasks autonomous.
Ilan (Ex-Datadog): If you have a job that takes ten hours to do because it is all one-step, you cannot automate that. But if you have a bunch of small jobs that come together to make an output, you can automate each and every single one of those individually.
Vikram (Astronomer): Yes, it is, and we can actually scale them up. Use a lot of micro pipelines rather than macro monolithic systems, which are much harder to make autonomous.
Salil (Uncorrelated Ventures): If an app has any state at all,you will have to deal with the state to make it autonomous, and it will probably be impossible to make it autonomous. Does that mean all apps should be stateless?
Vikram (Astronomer): Stateless apps are a lot easier to make autonomous. I think stateful apps can’t be made autonomous. We've had database transactions for decades now. There are checkpointing and robots. But it needs to be deliberate as opposed to accidental.
Salil (Uncorrelated Ventures): You can't handle application state the way you handle database state with all the checkpointing. There is a database inside it and you will be managing a database inside the app.
Vikram (Astronomer): Not necessarily. A lot of data transformation systems have checkpoints, and you would revert back to a last known checkpoint. It is basically a state, but you can say, “Hey, at what point do I checkpoint this?”
Salil (Uncorrelated Ventures): But then something like Sedai will have to be aware of it to be able to go back to that.
Vikram (Astronomer): It is hard, but that's different incarnations of autonomous frameworks for different layers of the stack that we want to manage. I'm not talking about a particular implementation like Sedai. I'm talking about general autonomy as a principle, like in computing.
Ilan (Ex-Datadog): The best app is one word of the state. You (Salil) have made the state somebody else's problem. That doesn’t mean you can't have any state, but that every part of your microservice architecture is managing its own state. Have the state in as few places as you can so you don't have to fight that problem.
To your (Vikram’s) point, if we can get those systems to return something via an API to give us ideas on where it is in a transaction and if we can take an action on it, then we can make that autonomous. But that means you have to have state systems that are well-architected and well-designed, that expose us to the right information, which you will never be able to do if every single microservice is managing state.
Salil (Uncorrelated Ventures): Isn't the simple solution to make state somebody else's problem and use a cloud data service to store your data and make everything you do stateless.
Shridhar (AWS): That's where I started from. Don't go into building APIs just to get state and so on. Let services do it for you.
Salil (Uncorrelated Ventures): The nature of workloads is going to change completely in the future. Do you want to say more about that?
Sanjay (First American): The autonomous systems are becoming much smarter. Previously, we had to train models, and based on the information we provided, it'll carry out whatever you ask it to do. But now the models are unsupervised. They can absorb the world's data, and do reasoning. That completely changes how we think about building enterprise systems. Given that, I see the potential of change that can happen in every industry.
For example, if you take the bond market and the stock market combined, the real estate industry alone is bigger than both of them. But it is so difficult to transact real estate. It still takes thirty days to buy a house when you can buy stocks at your fingertips. Why? Because of the process that runs these companies.
In an ideal world, if everything is automated, the business architecture of any company can potentially change dramatically.
Venkat (Talkdesk): It's not like the industries don't want to move faster. It's just about compliance and safety. While automation can accelerate, a lot of the work in these industries is decision making. Some of these industries will be impaired until the regulations come in at the right level with technology.
Ilan (Ex-Datadog): People say, “Are you worried about losing jobs to autonomous systems?” The answer is always “No”, we will get to work on more interesting projects.” But there are industries where I don't think that will be the case.
It is a different story in engineering because we don't have enough staff there. But let's say your entire job is to look through the history and ensure that nobody else has a claim on this house. If I automate you out of a job, what more interesting thing you can get to do as a paper pusher? Some of the regulations exist because people need jobs in the physical world.
Sanjay (First American): To automate the physical industries, a lot more Sedai’s will be required. The goal is not to build one big system or an automatic system. It is to build small components and stitch them together to run autonomously. That can be done for any industry.
Vikram (Astronomer): I would agree with that.
Ilan (Ex-Datadog): It comes down to standardization. Each time you make a standard in the decision tree, that standard becomes automatable.
Salil: I want to ask about Talkdesk’s journey of automation.
Venkat (Talkdesk): Think of a three by three matrix; Y-axis being AI and X-axis being a human. Easy-to-Hard on humans is along the Y-axis. The easiest for AI, but the hardest for humans is probably the magic quadrant. Everything along that X-axis will be your area for automation because those are low hanging fruits as this is where a lot of volume work is done.
For example, let's pick up a bank sending out a Non-Sufficient Notice. The bank already knows that you don't have money in the account, and your car loan is supposed to be paid in the next two days. They could have easily sent you a SMS to remind you that you're going to be out of funds. But they can't physically do it today because of the amount of human engagement needed to write complex queries, search multiple things, and then finally send out a notice.
This is the area where we can have autonomous systems come in very quickly and do things which are hard for humans but much easier for AI. That's the kind of philosophy we use. We don't do it just at the application level but also at an infrastructural level.
Shridhar (AWS): I'm glad to hear people index so much on observability, because that's what I run at Lambda. Getting rid of undifferentiated heavy lifting is perhaps one of the first steps towards getting to this world where we want things to be autonomous.
If someone has to do a whole bunch of things every single time they have to stand up a stack, that setup is the furthest away from being autonomous.
Vikram (Astronomer): I was hoping we could have a spirited discussion about centralized, decentralized or federated systems.
Venkat (Talkdesk): Lack of automation centralizes the architecture, which will lead to challenges, and you will not be able to make anything out of numbers.
Sanjay (First American): There are a lot of traditional industries that built their systems decades ago. To make any change, you have to interact with these systems and it becomes very hard to introduce any new thing.
If there was a centralized system which understood a company’s business really well, after having consumed all the data, deeds, trust, mortgages, etc. that exist in the world, then all you have to build is very light food systems, which are targeted toward your internal employees. Your employees would only need to ask the right questions to this central autonomous system, and it will give the answers. That's the kind of the feature stack I think will emerge in the traditional industries, if not in the tech industry.
Shridhar (AWS): But aren't you introducing a single point of failure in that case? If you have to scale in size, functionality, or technology, you will first have to make sure that the central system understands what's going to happen. Otherwise, you just siloed a thing that probably exists, but nobody knows about.
Ilan (Ex-Datadog): Right now, the challenge is that everything is distributed across a hundred systems. To change some business logic, you will have to go change a hundred systems. On the other hand, we've spent the last decade trying to get away from these crazy monoliths where everything's in a big ball of code that we can't understand and deploy because it might break a bunch of decisions we're trying to make. The answer is not federated versus centralized. You just have to make the right decision at the right place in your organization.
Salil (Uncorrelated Ventures): We should never put any configuration in code or ,we should forget about infrastructure as code. We should just let autonomous systems drive the configuration.