Site Overlay

What is the Operational Excellence Pillar for Well Architected?

Reading Time: 6 minutes

The Serverless Edge Team talk through their experiences of working through the Operational Excellence Pillar on their well architected journeys. This is the first of a series of conversations on well architected.

Dave Anderson  

Hi folks! Welcome to the next edition of Serverless Craic from The Serverless Edge. Just a couple quick introductions, I am Dave Anderson, Author and Contributor for The Serverless Edge and Technical Fellow at Bazaarvoice.

Mark McCann  

Mark McCann, Author/Contributor at The Serverless Edge, Stay At Home Dad getting back on the fitness wagon, bringing my 5k times down again after the Christmas excess!

Mike O’Reilly  

Cheers Mark! Michael O’Reilly, Software Architect with Globalisation Partners and trying to get back on the wagon. I just can’t catch it up in relation to exercise post Christmas! 

Dave Anderson  

The secret is to never get off the wagon!

Mike O’Reilly  

Remain on the wagon.

Well Architected

Dave Anderson  

So we figured there’s been a lot of good conversation about well architected and the well architected framework. And we’ve written about this in the blog: well architected and SCORP or SCORPS, which is the five and now six pillars of well architected. Well architected is really interesting, because not only AWS, but both Google and Azure have their own versions of well architected, and they’re all quite similar. But we have found massive success from working through these pillars. So we figured we’d hit each pillar and go through them to have a quick chat about it. So we figured we’d do operational excellence first. Is there anything else you would like to say at a high level about well architected.

Mark McCann  

It’s something we have found  to be incredibly useful. It gives a frame of reference and a structure for asking better questions of our teams, systems, structures, and our processes and practices. So it has been hugely useful for trying to evolve engineering, practices and companies. It’s hardened and approved, and it’s been battle tested in 1000s of companies which gives it a lot of credibility. It’s not just Dave, Mark and Mike’s opinions. It’s good practice that has been proven to work. 

Mike O’Reilly  

That’s a major strength, isn’t it? I like the ubiquity. Whether you’re an architect, an engineer or a manager in one organisation, when you go to another, it’ll make sense.

Dave Anderson  

It’s not a yearly process to deliver compliance once a year with well architected. It should be part of continuous architecture. The reason why I always encourage people to get certification is not for a bit of paper or a free water bottle, it’s because you have to learn well architected as part of certification. So starting with operational excellence, the AWS pillar breaks down into three areas. Each area has five or six questions. So the three areas (in the operational excellence pillar) are prepare, operate, evolve.

Bhutan's iconic Tiger's Nest Monastery on The Serverless Edge
Photo by Ameya Sawant on Unsplash

Operational Excellence Pillar

Dave Anderson  

Operational excellence means a lot of things to a lot of people, but let’s chat about prepare. What have your found to be in the prepare part of this?

Mark McCann  

It’s great to go in new areas and teams to asking these questions:

Do you know who your users are? Do you know what what the purpose of your team is? Do you know what your highest priority thing is?

Some are very simple, basic questions.

Are you set up to meet the challenges that you’re faced with, the business requirements that you’re going to pursue or the needs you’re trying to meet?

So simple questions like how do you determine what your priorities are can be very revealing. If you are in a safe space with the whole team involved you can get a really good conversation. We know our priorities for this week and for next week, but we’re not quite sure what we’re doing for the month after. It’s a good conversation to tease out if you are aligned with the strategic direction? Do you have a prioritisation framework or are you making it up ‘on the hoof’? 

Mike O’Reilly  

This is a pillar that needs the whole team involved in the conversation. Some questions require management to be involved, some require the tech lead or the engineer to understand the big picture and operations. We talk about consistency.  In this section there are recommendations for playbooks/runbooks and standards for making preparations for your operation: prepare for failure or everything fails all the time. You have got to prepare to move onto post implementation and hand off to different team or place where you’re bringing on new engineers or whatever.  Do you have the runbooks for the operations in a particular workload? Do you have the playbooks that are linked to observability in your dashboard, so that when things go wrong, there’s a solid set of instructions to deal with that problem and they don’t have to go in and unpack what you’ve built out. So there’s a lot of good, solid foundational guidance. From an architecture perspective (we’re all architects), it’s table stakes for consistency across teams.

Prepare

Dave Anderson  

‘Prepare’ looks at tribal knowledge like when you ask a question and the response is ‘Fred says’. In other words: ‘I don’t know why we do that, but Fred says, we do that’. Or the response is: ‘ask my manager’. But what happens when your manager isn’t there? We need leadership and empowerment within the team and written down for everyone. So ‘Prepare’ checks team culture.

Mark McCann  

It also checks simple stuff like: do you have enough people to meet the challenges? Do you have assigned owners who are going to be responsible for processes, practices and operations. If you can get these foundations in place early, you evolve, go down through the lifecycle and start applying the other well architected pillars. Your chance for success greatly improves because your operational excellence pillar has set the foundation.

Operate

Dave Anderson  

The next pillar is operate.  So you start with prepare and then move to operate. I like operate because there’s a lot of observability.  I like thinking of a workload as an asset, how to understand the health of that asset and how to monitor it to make sure it’s working well.

Mike O’Reilly  

It’s about getting the team ready for production. A particular bugbear of mine is when teams aren’t thinking about how to validate in production and how to spot regression. What are the key performance indicators of the workload? When things go wrong, are they able to spot it and have they thought about how to remediate or correct those sorts of things. You go back to prepare again. There’s always something that is going to go wrong, something you haven’t predicted or an alternate path has been missed. So when those things happen, have you got the correct procedures for learning what that defect teaches so you can bake it in and toughen up your operation going forward. It’s an holistic way of thinking and you need those mechanisms to show you how your workload performance by product. 

Mark McCann  

It’s critical to have those information radiators and dashboards available and not just for the team.  If you have proper observability you can show the C suite the team working on a particular capability, feature or value stream and how it relates to our vision and strategy. That’s proper operational observability across everything including not only the health of your workload, but the health of your team. Door key metrics should be part of how you operate with a sustaibable pace for the team.

Evolve

Dave Anderson  

The last one is evolve. You go through prepare, operate and then evolve. And it’s quite simply about how you evolve operations which doesn’t mean cutting costs and reducing the budget!

Mark McCann  

It’s what Mike said earlier. It’s about having a continuous improvement mindset with feedback loops in place. We’re big into mapping and evolution is a cornerstone of Wardley mapping. If you don’t take these signals from your systems and your workloads on board and use them to evolve improve and get better than there’s no point having observability and dashboards.

Mike O’Reilly  

That’s the key point.  We’ve written about the SCORPS  process, and driver of continuous improvement. Your operations are going to generate a lot of data and  useful information that you, as an engineer, manager or architect can use to evolve your current setup. You should be always looking to learn.

Dave Anderson  

The operational excellence pillar sets us up nicely because once you think through evolve and operations, you’re evolving the other pillars of cost, security, reliability, performance, and sustainability. You can always save more money, make the thing faster, more reliable, make it cheaper, make it more secure. People think operations are done because it’s rolling and it’s fine. But there’s always things you can improve.

Mark McCann  

You set up for success and you put the foundational building blocks in place to increase your chances of a successful development cycle.

Dave Anderson  

So that’s the operational excellence pillar from well architected. That’s the craic. We’ll be talking some more about the pillars. There are posts on this on TheServerlessEdge.com, on Twitter @ServerlessEdge, LinkedIn and Medium. So thanks very much. 

Transcribed by https://otter.ai

Leave a Reply

Your email address will not be published. Required fields are marked *

Translate »