Learn how Palo Alto Networks is Transforming Platform Engineering with AI Agents. Register here

Attend a Live Product Tour to see Sedai in action.

Register now
More
Close

How fabric’s SRE Team Improved Customer Experience with an Autonomous Cloud Platform

With Sedai, fabric decreased latency by 48%, speeding up revenue-driving ecommerce services for its customers while reducing SRE workload by 33% & gaining new insights into code performance

RESULTS

48%

Latency Reduction

33%

SRE Workload Reduction

6.7x

Customers per SRE improvement

use cases

Application Performance & Availability
SRE Productivity
Cost Reduction

key capabilities

Autonomous Optimization
Release Intelligence

TECH stack

AWS Lambda
Datadog
Node.js
Python

industry

ECommerce Software

geography

North America

request a demo

Introduction

fabric is a leader in the headless commerce movement and recently reported 4.5x YoY growth and $140M in New Series C Funding.  As trillions of B2B and B2C commerce dollars go online, companies like BarkBox, GNC and Restoration Hardware have been turning to fabric to scale their customer experience. The company levels the commerce playing field by enabling thousands more brands to compete and grow in the Amazon era.  fabric offers brands modern commerce technology and deep industry operating experience previously only available to the world's largest companies.

fabric runs its application on AWS, and makes extensive use of serverless functions with over 7K individual services in operation across more than 10 cloud accounts for multiple parts of its application including core commerce functions (e.g., shopping cart), adjacent services (e.g., loyalty), the underlying data/services layer  supporting these services, as well as links to external services (e.g. payments, shipping).  The bulk of fabric’s serverless code is written in Node.js and Python, the most popular runtimes for serverless industry-wide.

The company made an early decision to build with serverless to take advantage of its low cost.

“Building with FaaS and only paying for function execution time rather than constantly running servers (i.e. serverless) allowed us to serve customers early on without incurring high infrastructure costs”

Devashish Pandey

Lead Software Development Engineer, fabric

The fabric platform can be hosted in either AWS, GCP, Azure, and even Knative since all of these are supported by the serverless framework used by fabric. But fabric saw AWS as the ideal choice.

When fabric adopted serverless, AWS had the most mature offering so we made the simple choice in choosing Lambda for FaaS”

Devashish Pandey

Lead Software Development Engineer, fabric

fabric uses Datadog as its primary monitoring tool.

Problem

As fabric’s business grew rapidly they noticed that application performance and latency was becoming a challenge.  They also faced unpredictable traffic loads as individual retailers drove traffic peaks in individual dedicated environments. This was a major concern for the SRE team.

In this dynamic environment they found that it was difficult to optimize serverless function & overall application performance given the need to continuously:

1. Right size the function
2. Manage provisioned concurrency
3. Choose how and when to warm up serverless functions and how to keep it warm
4. Trace errors in this heavily distributed environment
5. Keep costs low

The SRE team also found their bandwidth being stretched as they had to cover multiple different priorities including setting up and supporting new customer environments as well as optimizing existing customers’ accounts.

Solution

To address the growing challenge, fabric rolled out Sedai’s autonomous system.  Avinash Gupta, Senior Site Reliability Engineer, first used Sedai for one customer, and saw the improvements with latency where Sedai adjusted the memory along with hidden CPU and provisioned concurrency.  This eliminated the toil of maintaining the service’s configuration manually.

“After we piloted Sedai with one customer we saw the improvement in shopping cart performance.  The latency improved on the website as users logged in, visited product pages and then went to their shopping cart and checked out.  It gave us the sense that Sedai did the job for the customer, and that’s when we rolled out to other customers.  Now we enable Sedai for every customer we onboard”

Avinash Gupta 

Senior Site Reliability Engineer, fabric

fabric has enabled more than 10 customers on the platform.  fabric found the largest value in activities the team could not scale manually because lambdas are complex and error tracing is hard. Sedai took these stressors away.

To date Sedai has run over 1000 optimizations to improve the performance of fabric’s serverless functions.

Relatively little effort is needed to manage Sedai. “Sedai’s autonomous system is proactive rather than reactive. It’s hands off” said fabric’s VP Engineering.  

Avinash checks on Sedai on a weekly basis and makes sure new accounts are onboarding effectively, that autonomous mode is on, and reviewing the optimizations being executed by Sedai.  Avinash does not need to log in to Sedai daily to review manual actions.

“I log in every two or three weeks unless I am setting up new customers.  I know Sedai is doing its job in the back end so I don’t need to come in regularly”

Avinash Gupta 

Senior Site Reliability Engineer, fabric

Improved Customer Experience through reduced latency

fabric was able to improve performance.  Avinash says “For the fabric SRE team, the number one concern is always latency. No ecommerce consumer wants to use a site that’s slow.  Sedai reduced our latency by up to 95%.  Sedai came in and fixed whatever was keeping calls from coming through, resulting in overall improved performance”.  One of fabric’s retail customers reached out and said that the shopping cart was running unusually fast.  The latency reduction has to date resulted in 57 days of cumulative latency reduction, and fabric had spent a mere $43 in additional cloud spend to achieve this.  As an example the time taken of the function that adds an item to a shopping cart has reduced by 88%:

The latency reductions achieved by fabric are passed through to their customers in the form of faster response times.  And for fabric’s customers this means more revenue through reduced cart abandonment rates.  According to a 2022 study by Portent for ecommerce as a whole, a site that loads in 1 second has an e-commerce conversion rate 2.5x higher than a site that loads in 5 seconds.  And 45.4% of consumers say they are less likely to make a purchase if a site appears slow.

Improving SRE productivity by reducing toil

fabric was also able to scale up the customer base using the platform without needing to linearly scale up the SRE team.  “Sedai helped us scale from having only one SRE supporting 2 to 3 customers on our commerce platform to now supporting 20 customers.” said Prakash Muppirala, Executive Vice President.  Avinash noted “If we didn't have Sedai, I would need at least 3 more SREs just to monitor the infrastructure”.   Sedai’s autonomous approach has helped fabric implement key principles of SRE including automation and toil elimination.

“You don’t want to be concerned about what is happening in the infrastructure.  Sedai is living one third of my life that I am not bothered about now.  It takes care of my tasks without my input”

Avinash Gupta 

Senior Site Reliability Engineer, fabric

“Had Sedai not been there we would need dedicated resources to monitor why latency problems are occurring and to understand why errors are popping up, and resources would be needed to cover each serverless function.  Checking these manually would mean we’d need SREs sitting around all day triaging issues.  Sedai is now doing this for us autonomously.Our SRE can instead focus on customer reporting, new deployments and site enhancements” notes Avinash.

Cost Improvements

fabric has also seen cost improvements from optimization.  “Sedai is giving fabric a cost benefit.  When the Lambdas are not extensively used, Sedai scales down memory based on the traffic pattern.  So when customers are less active during the night, Sedai scales down” said Avinash. In addition to memory, Sedai also cost-optimizes provisioned concurrency to avoid over-provisioning and optimizing function warm up.

Release Intelligence

fabric also now receives early feedback on the performance of every serverless release with Sedai’s Release intelligence feature.  Each release is rated on a scale of 1-10 based on the amount of deviation from the prior release.  These provide signals to fabric on where effort should be placed to optimize the underlying application code.

Increased Visibility into Performance and focus areas

Sedai has also been helpful in providing information on where to focus in terms of performance.  “The SRE team can now focus more on why APIs are throwing errors rather than sitting around monitoring and waiting for alerts” said fabric’s VP Engineering.  The SRE team is also able to communicate the progress of Sedai using weekly reports and graphs from the Sedai dashboard.

Avinash is also able to monitor the impact of the optimizations made by Sedai in Datadog.  Sedai’s Datadog integration pushes events into Datadog, which allows event overlays on fabric’s latency and memory concurrency data so the SRE team can see the exact time at which Sedai optimizations are implemented and how that affects subsequent performance.  “I have a single Datadog panel where I can see the impact of Sedai’s autonomous actions” said Avinash.

Embracing Autonomous to Reinforce fabric’s Competitive Advantage

fabric recognized the potential to embrace the autonomous systems paradigm as a way to immediately gain competitive advantage.

“The first time we saw Sedai we realized the traditional approach to site reliability was outdated.  We have embraced autonomous management as the path to the future.  Sedai allows us to  innovate faster and cut toil”

Prakash Muppirala

Executive Vice President, Platform Solutions, fabric

Learn more about improving customer experience and reducing SRE effort like fabric did