Speaker:

What are service level agreements and why are they absolutely

Speaker:

essential to managing complex, multi service

Speaker:

modern applications? Today I continue my discussion on

Speaker:

modern ops with Beth Long. Are you ready? Let's

Speaker:

go. This is the Modern

Speaker:

Digital Business podcast, the technical leader's guide to

Speaker:

modernizing your applications and digital business. Whether you're a

Speaker:

business technology leader or a small business innovator, keeping

Speaker:

up with the digital business revolution is a must. Here to help make

Speaker:

it easier with actionable insights and recommendations, as well as

Speaker:

thoughtful interviews with industry experts. Lee Atchison

Speaker:

in this episode of Modern Digital Business, I continue my conversation on

Speaker:

Modern operations with my good friend SRE

Speaker:

engineer and Operations manager Beth Long. So, Beth,

Speaker:

great to see you again today. And today we wanted to

Speaker:

talk about SRE terminology and

Speaker:

measurements, and it's fantastic that we have a

Speaker:

SRE in our miss in order to do that. So I'm glad you're

Speaker:

here. Great. Let's get started on

Speaker:

this. So SRE cyber liability Engineer

Speaker:

is tied very closely to the concept of DevOps,

Speaker:

but they're really not the same thing. Can you start out

Speaker:

by telling us what's the difference between DevOps and SREs?

Speaker:

I love this question. I've talked about this a number of times. And I'm going

Speaker:

to get back at you for asking me this by flipping it around and asking

Speaker:

you the same thing in a minute, but I'll take a stab at it. So

Speaker:

SRE site reliability engineering originated out of

Speaker:

Google, gee, almost 20 years ago now, I

Speaker:

guess. Yeah. And it was

Speaker:

really a discipline that was a

Speaker:

response to the

Speaker:

pressures of managing technology

Speaker:

at Google scale. And so

Speaker:

a lot of the practices that are associated with site reliability

Speaker:

engineering are the things that Google

Speaker:

developed internally to help them manage their scale as

Speaker:

they grew and then began to evangelize out to the wider

Speaker:

community. And so now a lot of those practices have been adopted more

Speaker:

widely and have been iterated upon. But

Speaker:

that's the origins of site reliability engineering

Speaker:

and the origins of DevOps are

Speaker:

a little bit more cross

Speaker:

cutting, a little bit more democratic, I

Speaker:

guess, and came out of

Speaker:

people around the same time

Speaker:

realizing that the

Speaker:

siloing of development and operations

Speaker:

was leading to unhealthy

Speaker:

patterns in the software engineering.

Speaker:

So people like John ALSPAW, who we'll talk about a

Speaker:

little bit later probably, if we touch on incidents at all, were

Speaker:

prominent in kind of saying, let's rethink how we're doing

Speaker:

the software engineering practice. So DevOps really focused

Speaker:

on integrating development and operations so

Speaker:

that those functions were

Speaker:

shared more as opposed to completely siloed. And

Speaker:

site reliability engineering was a set of practices

Speaker:

around maintaining stability and reliability

Speaker:

of large scale web operations. And so there

Speaker:

are some foundational topics that I'd like to ask you about actually,

Speaker:

around things like service level indicators, objectives and

Speaker:

agreements and a wide number of other

Speaker:

practices. So this is a very wandering answer to say

Speaker:

that the major difference between the two I think is really kind

Speaker:

of one of ancestry and how they started and

Speaker:

SRE sort of being a set of

Speaker:

practices and DevOps being more of a

Speaker:

philosophy and an approach to the development

Speaker:

environment. Yeah, it's almost like the

Speaker:

SRE is a practice that occurs within a DevOps model, but

Speaker:

it exists independently as well too. But it's a role

Speaker:

within DevOps. But not the only role within DevOps. Right?

Speaker:

Yeah. Now what's interesting is you

Speaker:

hear both DevOps and SRE talked about as

Speaker:

practices but also you'll hear about SREs talked

Speaker:

about as a profession, but yet

Speaker:

you don't talk about DevOps as a profession.

Speaker:

And in fact people do, but usually it's considered a

Speaker:

negative. I'm a DevOps engineer. No, there's no such thing as a

Speaker:

DevOps engineer. So is that

Speaker:

also part come from the historical

Speaker:

nature of where it came from or is there really is a difference there that

Speaker:

matters? This is a great question and something I hoped we

Speaker:

would touch on because I still kind of cringe a little bit when I see

Speaker:

DevOps engineer, but I've come to understand why

Speaker:

that job title has meaning. Because

Speaker:

there are organizations that for a number of reasons,

Speaker:

including the size of the organization, the history

Speaker:

of it, its composition, sometimes it does make

Speaker:

sense to have people focus

Speaker:

on the kinds of things that happen at the

Speaker:

boundary of development and operations.

Speaker:

And so you'll get DevOps engineers who focus on internal

Speaker:

tooling build and deploy pipelines

Speaker:

that of

Speaker:

activity. Yeah, I always hate the word DevOps engineer applying

Speaker:

to that as opposed to like infrastructure engineer or tooling

Speaker:

engineering. But you're right, you do hear that, you hear that

Speaker:

apply there. What it almost seems like though is you

Speaker:

hear a large organization say DevOps is good,

Speaker:

we need to go to DevOps. Okay, you and you are now DevOps

Speaker:

engineers. Exactly. And that's not the way it's

Speaker:

done, of course. And often they become the ones then that

Speaker:

focus on the tooling and kind of become those tooling engineers and keep

Speaker:

the DevOps title. And it's not always a good

Speaker:

history that brings you to that situation.

Speaker:

Yeah. And to answer your original question, I

Speaker:

think

Speaker:

there's a little bit of a CRISPR definition around

Speaker:

what a site reliability engineer does,

Speaker:

but there's still a lot of fuzz in the definition and there's a lot

Speaker:

of range in if someone says they're an SRE, what they actually do. It's

Speaker:

still going to be quite a wide range of options. But

Speaker:

the origins of site reliability engineering go back to

Speaker:

bringing the software engineering discipline

Speaker:

into the operations realm. And so again you see this sort

Speaker:

of both SRE and DevOps are really about

Speaker:

crossing that boundary, but it's

Speaker:

almost. In the opposite direction of what exactly. Yeah, exactly.

Speaker:

DevOps is more about bringing Ops into dev and

Speaker:

SRE is more about bringing the processes

Speaker:

of development into operations. Right. And so you are much more

Speaker:

likely to end up with an SRE group

Speaker:

that is sort of helping the whole organization level up with those

Speaker:

things. Whereas a DevOps organization,

Speaker:

at least in the way that I tend to use DevOps and I think you

Speaker:

and I are similar in this a DevOps organization is going to

Speaker:

be you're on call for your own services rather. Than having

Speaker:

an operations center and some of those things that are more at the

Speaker:

organizational scale as opposed to SRE tending

Speaker:

to be more likely that you're going to have a group of

Speaker:

people that are bringing those things to the organization. Yeah,

Speaker:

in that manner, SRE group or SRE

Speaker:

engineers is more akin to like an architecture group and

Speaker:

architects, they're assigned to individual parts of the

Speaker:

project, but they also have some global responsibilities as

Speaker:

well and shared knowledge. And whether they're in

Speaker:

one group or distributed is a

Speaker:

much more fluid question. That depends on the

Speaker:

organization versus a clear cut who should be in which group.

Speaker:

Sort of a model that is more akin to what

Speaker:

happens in DevOps. I like that distinction. So now

Speaker:

we know SRE is not the same as DevOps and we understand the difference

Speaker:

between them. That's great. So you bring me, get me back on something. Now

Speaker:

you had said, I'm not really looking

Speaker:

forward to that, whatever that is.

Speaker:

If there's one thing that's iconically associated with

Speaker:

SRE, I think it's fair to say that it's service level indicators

Speaker:

and service level objectives and service level agreements.

Speaker:

Slis, SLOs, SLAs. The acronyms

Speaker:

confuse everybody, even those who have been using them for years.

Speaker:

And I know that you have a very pragmatic approach

Speaker:

to kind of tackling some of these questions. So I'd love first for folks that

Speaker:

aren't deeply aware of those, maybe kind of set the scene and then I'd

Speaker:

love to kind of hear your take on how you can implement those.

Speaker:

Well, sure. I even confuse Slis and

Speaker:

SLOs. And so I'm going to need help with the definition if we're going

Speaker:

to define what the three are. But I'd almost prefer to avoid the

Speaker:

definitions and talk about what the problem is that's going on there. What the

Speaker:

problem is, is what all of them are trying to indicate

Speaker:

is the health of something, the health of a code base,

Speaker:

the health of a service, the health of an application.

Speaker:

Now historically the word SLA service

Speaker:

level.

Speaker:

Agreement. Agreement? Yeah.

Speaker:

SLA service level agreement comes

Speaker:

from inter customer connections.

Speaker:

So you have a provider of a service, of an

Speaker:

application that has a customer, and that customer says, we'll buy

Speaker:

your service, but I need a SLA service level

Speaker:

agreement that specifies how well or

Speaker:

what your service is going to do for me. And often those

Speaker:

agreements are around things like uptime

Speaker:

latency, how fast the application will work,

Speaker:

how many users can be connected to it. There could be a

Speaker:

thousand different dimensions on how it's measured, but it's usually some form of

Speaker:

measurement of a guarantee to the customer

Speaker:

of what the service or application that

Speaker:

the provider of that will guarantee in

Speaker:

exchange for usually money in the case of a customer

Speaker:

relationship. So an SLA has a very long history.

Speaker:

It's been around for a long time. The word SLA probably goes back

Speaker:

long before either one of us were born, because it applies

Speaker:

to contract work in general, not just software or

Speaker:

computer work. And so it's been around for a long time. But

Speaker:

what's happened in I believe it was Google

Speaker:

is the one who started the Slo or the SLAI model. I believe they're

Speaker:

the ones that did part of the SRE revolution. It included with

Speaker:

them. But what was decided was

Speaker:

we need some way to at a smaller scale as we

Speaker:

take this large application and now internally

Speaker:

divide it into services and into microservices and into

Speaker:

its various components. And especially in DevOps

Speaker:

models, we needed a way to say this part of the

Speaker:

service has requirements that it must

Speaker:

perform to. It has obligations it

Speaker:

needs to be able to handle in order to serve the needs

Speaker:

of the other services around it.

Speaker:

And so Google created new terms called Slis and SLOs

Speaker:

in order to distinguish them from

Speaker:

SLAs for how you

Speaker:

measure those parts of the application. And the

Speaker:

idea is Slis and SLOs are internal

Speaker:

measurements for internal customers, and SLAs were

Speaker:

external measures for external customers. That's where I have

Speaker:

my problem, because in my mind, in a service oriented

Speaker:

architecture, in a service oriented

Speaker:

team model, if you own a service

Speaker:

and other services depend on you, those other teams

Speaker:

are your customers. The fact that they sit down the hall from

Speaker:

you or right next to you, or on another floor, but in the same

Speaker:

company is irrelevant. They're still your customers.

Speaker:

Whether they're an internal customer or an external customer doesn't matter.

Speaker:

They're your customers. You need to keep them happy for

Speaker:

your application to perform as expected. So when

Speaker:

you provide a service that someone else is depending

Speaker:

on, and you specify what the requirements are for running that

Speaker:

service, those are service level agreements. Those are the

Speaker:

agreements that you have with the other service owners

Speaker:

of how your service will behave. There's no

Speaker:

difference between those SLAs as the external ones. So

Speaker:

don't call them something different, because that implies there's something less.

Speaker:

Right? An Slo implies an

Speaker:

internal agreement, which of course, internal agreements aren't

Speaker:

official agreements. Well, they're not as important, right?

Speaker:

SLA implies an external agreement, which is important because we're

Speaker:

talking about customers here. They're all customers. They're all

Speaker:

external. They're all SLAs. When you make an

Speaker:

agreement that your application performs a certain way, there's

Speaker:

no difference in whether or not that agreement is made with another team within your

Speaker:

organization or to an external customer. They're just as important

Speaker:

because guess what? If you break your agreement

Speaker:

for how your service performs with another team. That's not going to

Speaker:

just affect the other team. That's going to affect all the teams that they depend

Speaker:

on, and ultimately it's going to affect the customer. So it all

Speaker:

matters. They're all just as important. So let's not invent

Speaker:

new terms to describe them. In my mind, they're all

Speaker:

SLAs. So if you have 100 service teams

Speaker:

within your organization and they have their

Speaker:

criteria for how they are expected to perform

Speaker:

to support the other service owners, those

Speaker:

are SLAs. Those expectations are

Speaker:

service level agreements. They need to be treated at the same level of

Speaker:

importance as the customer level service level

Speaker:

agreements. I find that really interesting because you're getting

Speaker:

at the fact that words matter and what we

Speaker:

call things matter, because I think there are a lot of

Speaker:

really interesting organizational challenges with implementing

Speaker:

SLOs and SLAs effectively. And one of them sort of on the

Speaker:

flip side, is that when teams

Speaker:

talk about service level objectives,

Speaker:

they often sort of set them arbitrarily based

Speaker:

on, okay, these are the things that I can measure and these are

Speaker:

my objectives as the owner. And

Speaker:

what you're getting at is the fact that these really need to be

Speaker:

agreements. They need to be hashed out with product

Speaker:

owners and technical leads and people who are

Speaker:

deeply familiar with the customer, whether that customer is internal or

Speaker:

external. Stay tuned for our next Modern Ops segment

Speaker:

when Beth and I continue our discussion on modern application

Speaker:

operations by talking about ownership in a modern

Speaker:

operations world. Thank you for

Speaker:

tuning in to Modern Digital Business. This podcast exists

Speaker:

because of the support of you, my listeners. If you enjoy what you

Speaker:

hear, will you please leave a review on Apple podcasts or

Speaker:

directly on our website at MDB FM. Slash

Speaker:

Reviews if you'd like to suggest a topic for an episode or

Speaker:

you're interested in becoming a guest, please contact me directly by

Speaker:

sending me a message at MDB FM contact.

Speaker:

And if you'd like to record a quick question or comment, click the

Speaker:

microphone icon in the lower right hand corner of our website.

Speaker:

Your recording might be featured on a future episode. To

Speaker:

make sure you get every new episode when they become available, click

Speaker:

subscribe in your favorite podcast Player, or check out our website at

Speaker:

MDB FM. If you want to learn more from me,

Speaker:

then check out one of my books, courses or articles by going to Lee

Speaker:

Atchison.com, and all of these links are included in the show.

Speaker:

Notes. Thank you for listening and welcome to the world of the

Speaker:

Modern Digital Business.