What are service level agreements and why are they absolutely
Speaker:essential to managing complex, multi service
Speaker:modern applications? Today I continue my discussion on
Speaker:modern ops with Beth Long. Are you ready? Let's
Speaker:go. This is the Modern
Speaker:Digital Business podcast, the technical leader's guide to
Speaker:modernizing your applications and digital business. Whether you're a
Speaker:business technology leader or a small business innovator, keeping
Speaker:up with the digital business revolution is a must. Here to help make
Speaker:it easier with actionable insights and recommendations, as well as
Speaker:thoughtful interviews with industry experts. Lee Atchison
Speaker:in this episode of Modern Digital Business, I continue my conversation on
Speaker:Modern operations with my good friend SRE
Speaker:engineer and Operations manager Beth Long. So, Beth,
Speaker:great to see you again today. And today we wanted to
Speaker:talk about SRE terminology and
Speaker:measurements, and it's fantastic that we have a
Speaker:SRE in our miss in order to do that. So I'm glad you're
Speaker:here. Great. Let's get started on
Speaker:this. So SRE cyber liability Engineer
Speaker:is tied very closely to the concept of DevOps,
Speaker:but they're really not the same thing. Can you start out
Speaker:by telling us what's the difference between DevOps and SREs?
Speaker:I love this question. I've talked about this a number of times. And I'm going
Speaker:to get back at you for asking me this by flipping it around and asking
Speaker:you the same thing in a minute, but I'll take a stab at it. So
Speaker:SRE site reliability engineering originated out of
Speaker:Google, gee, almost 20 years ago now, I
Speaker:guess. Yeah. And it was
Speaker:really a discipline that was a
Speaker:response to the
Speaker:pressures of managing technology
Speaker:at Google scale. And so
Speaker:a lot of the practices that are associated with site reliability
Speaker:engineering are the things that Google
Speaker:developed internally to help them manage their scale as
Speaker:they grew and then began to evangelize out to the wider
Speaker:community. And so now a lot of those practices have been adopted more
Speaker:widely and have been iterated upon. But
Speaker:that's the origins of site reliability engineering
Speaker:and the origins of DevOps are
Speaker:a little bit more cross
Speaker:cutting, a little bit more democratic, I
Speaker:guess, and came out of
Speaker:people around the same time
Speaker:realizing that the
Speaker:siloing of development and operations
Speaker:was leading to unhealthy
Speaker:patterns in the software engineering.
Speaker:So people like John ALSPAW, who we'll talk about a
Speaker:little bit later probably, if we touch on incidents at all, were
Speaker:prominent in kind of saying, let's rethink how we're doing
Speaker:the software engineering practice. So DevOps really focused
Speaker:on integrating development and operations so
Speaker:that those functions were
Speaker:shared more as opposed to completely siloed. And
Speaker:site reliability engineering was a set of practices
Speaker:around maintaining stability and reliability
Speaker:of large scale web operations. And so there
Speaker:are some foundational topics that I'd like to ask you about actually,
Speaker:around things like service level indicators, objectives and
Speaker:agreements and a wide number of other
Speaker:practices. So this is a very wandering answer to say
Speaker:that the major difference between the two I think is really kind
Speaker:of one of ancestry and how they started and
Speaker:SRE sort of being a set of
Speaker:practices and DevOps being more of a
Speaker:philosophy and an approach to the development
Speaker:environment. Yeah, it's almost like the
Speaker:SRE is a practice that occurs within a DevOps model, but
Speaker:it exists independently as well too. But it's a role
Speaker:within DevOps. But not the only role within DevOps. Right?
Speaker:Yeah. Now what's interesting is you
Speaker:hear both DevOps and SRE talked about as
Speaker:practices but also you'll hear about SREs talked
Speaker:about as a profession, but yet
Speaker:you don't talk about DevOps as a profession.
Speaker:And in fact people do, but usually it's considered a
Speaker:negative. I'm a DevOps engineer. No, there's no such thing as a
Speaker:DevOps engineer. So is that
Speaker:also part come from the historical
Speaker:nature of where it came from or is there really is a difference there that
Speaker:matters? This is a great question and something I hoped we
Speaker:would touch on because I still kind of cringe a little bit when I see
Speaker:DevOps engineer, but I've come to understand why
Speaker:that job title has meaning. Because
Speaker:there are organizations that for a number of reasons,
Speaker:including the size of the organization, the history
Speaker:of it, its composition, sometimes it does make
Speaker:sense to have people focus
Speaker:on the kinds of things that happen at the
Speaker:boundary of development and operations.
Speaker:And so you'll get DevOps engineers who focus on internal
Speaker:tooling build and deploy pipelines
Speaker:that of
Speaker:activity. Yeah, I always hate the word DevOps engineer applying
Speaker:to that as opposed to like infrastructure engineer or tooling
Speaker:engineering. But you're right, you do hear that, you hear that
Speaker:apply there. What it almost seems like though is you
Speaker:hear a large organization say DevOps is good,
Speaker:we need to go to DevOps. Okay, you and you are now DevOps
Speaker:engineers. Exactly. And that's not the way it's
Speaker:done, of course. And often they become the ones then that
Speaker:focus on the tooling and kind of become those tooling engineers and keep
Speaker:the DevOps title. And it's not always a good
Speaker:history that brings you to that situation.
Speaker:Yeah. And to answer your original question, I
Speaker:think
Speaker:there's a little bit of a CRISPR definition around
Speaker:what a site reliability engineer does,
Speaker:but there's still a lot of fuzz in the definition and there's a lot
Speaker:of range in if someone says they're an SRE, what they actually do. It's
Speaker:still going to be quite a wide range of options. But
Speaker:the origins of site reliability engineering go back to
Speaker:bringing the software engineering discipline
Speaker:into the operations realm. And so again you see this sort
Speaker:of both SRE and DevOps are really about
Speaker:crossing that boundary, but it's
Speaker:almost. In the opposite direction of what exactly. Yeah, exactly.
Speaker:DevOps is more about bringing Ops into dev and
Speaker:SRE is more about bringing the processes
Speaker:of development into operations. Right. And so you are much more
Speaker:likely to end up with an SRE group
Speaker:that is sort of helping the whole organization level up with those
Speaker:things. Whereas a DevOps organization,
Speaker:at least in the way that I tend to use DevOps and I think you
Speaker:and I are similar in this a DevOps organization is going to
Speaker:be you're on call for your own services rather. Than having
Speaker:an operations center and some of those things that are more at the
Speaker:organizational scale as opposed to SRE tending
Speaker:to be more likely that you're going to have a group of
Speaker:people that are bringing those things to the organization. Yeah,
Speaker:in that manner, SRE group or SRE
Speaker:engineers is more akin to like an architecture group and
Speaker:architects, they're assigned to individual parts of the
Speaker:project, but they also have some global responsibilities as
Speaker:well and shared knowledge. And whether they're in
Speaker:one group or distributed is a
Speaker:much more fluid question. That depends on the
Speaker:organization versus a clear cut who should be in which group.
Speaker:Sort of a model that is more akin to what
Speaker:happens in DevOps. I like that distinction. So now
Speaker:we know SRE is not the same as DevOps and we understand the difference
Speaker:between them. That's great. So you bring me, get me back on something. Now
Speaker:you had said, I'm not really looking
Speaker:forward to that, whatever that is.
Speaker:If there's one thing that's iconically associated with
Speaker:SRE, I think it's fair to say that it's service level indicators
Speaker:and service level objectives and service level agreements.
Speaker:Slis, SLOs, SLAs. The acronyms
Speaker:confuse everybody, even those who have been using them for years.
Speaker:And I know that you have a very pragmatic approach
Speaker:to kind of tackling some of these questions. So I'd love first for folks that
Speaker:aren't deeply aware of those, maybe kind of set the scene and then I'd
Speaker:love to kind of hear your take on how you can implement those.
Speaker:Well, sure. I even confuse Slis and
Speaker:SLOs. And so I'm going to need help with the definition if we're going
Speaker:to define what the three are. But I'd almost prefer to avoid the
Speaker:definitions and talk about what the problem is that's going on there. What the
Speaker:problem is, is what all of them are trying to indicate
Speaker:is the health of something, the health of a code base,
Speaker:the health of a service, the health of an application.
Speaker:Now historically the word SLA service
Speaker:level.
Speaker:Agreement. Agreement? Yeah.
Speaker:SLA service level agreement comes
Speaker:from inter customer connections.
Speaker:So you have a provider of a service, of an
Speaker:application that has a customer, and that customer says, we'll buy
Speaker:your service, but I need a SLA service level
Speaker:agreement that specifies how well or
Speaker:what your service is going to do for me. And often those
Speaker:agreements are around things like uptime
Speaker:latency, how fast the application will work,
Speaker:how many users can be connected to it. There could be a
Speaker:thousand different dimensions on how it's measured, but it's usually some form of
Speaker:measurement of a guarantee to the customer
Speaker:of what the service or application that
Speaker:the provider of that will guarantee in
Speaker:exchange for usually money in the case of a customer
Speaker:relationship. So an SLA has a very long history.
Speaker:It's been around for a long time. The word SLA probably goes back
Speaker:long before either one of us were born, because it applies
Speaker:to contract work in general, not just software or
Speaker:computer work. And so it's been around for a long time. But
Speaker:what's happened in I believe it was Google
Speaker:is the one who started the Slo or the SLAI model. I believe they're
Speaker:the ones that did part of the SRE revolution. It included with
Speaker:them. But what was decided was
Speaker:we need some way to at a smaller scale as we
Speaker:take this large application and now internally
Speaker:divide it into services and into microservices and into
Speaker:its various components. And especially in DevOps
Speaker:models, we needed a way to say this part of the
Speaker:service has requirements that it must
Speaker:perform to. It has obligations it
Speaker:needs to be able to handle in order to serve the needs
Speaker:of the other services around it.
Speaker:And so Google created new terms called Slis and SLOs
Speaker:in order to distinguish them from
Speaker:SLAs for how you
Speaker:measure those parts of the application. And the
Speaker:idea is Slis and SLOs are internal
Speaker:measurements for internal customers, and SLAs were
Speaker:external measures for external customers. That's where I have
Speaker:my problem, because in my mind, in a service oriented
Speaker:architecture, in a service oriented
Speaker:team model, if you own a service
Speaker:and other services depend on you, those other teams
Speaker:are your customers. The fact that they sit down the hall from
Speaker:you or right next to you, or on another floor, but in the same
Speaker:company is irrelevant. They're still your customers.
Speaker:Whether they're an internal customer or an external customer doesn't matter.
Speaker:They're your customers. You need to keep them happy for
Speaker:your application to perform as expected. So when
Speaker:you provide a service that someone else is depending
Speaker:on, and you specify what the requirements are for running that
Speaker:service, those are service level agreements. Those are the
Speaker:agreements that you have with the other service owners
Speaker:of how your service will behave. There's no
Speaker:difference between those SLAs as the external ones. So
Speaker:don't call them something different, because that implies there's something less.
Speaker:Right? An Slo implies an
Speaker:internal agreement, which of course, internal agreements aren't
Speaker:official agreements. Well, they're not as important, right?
Speaker:SLA implies an external agreement, which is important because we're
Speaker:talking about customers here. They're all customers. They're all
Speaker:external. They're all SLAs. When you make an
Speaker:agreement that your application performs a certain way, there's
Speaker:no difference in whether or not that agreement is made with another team within your
Speaker:organization or to an external customer. They're just as important
Speaker:because guess what? If you break your agreement
Speaker:for how your service performs with another team. That's not going to
Speaker:just affect the other team. That's going to affect all the teams that they depend
Speaker:on, and ultimately it's going to affect the customer. So it all
Speaker:matters. They're all just as important. So let's not invent
Speaker:new terms to describe them. In my mind, they're all
Speaker:SLAs. So if you have 100 service teams
Speaker:within your organization and they have their
Speaker:criteria for how they are expected to perform
Speaker:to support the other service owners, those
Speaker:are SLAs. Those expectations are
Speaker:service level agreements. They need to be treated at the same level of
Speaker:importance as the customer level service level
Speaker:agreements. I find that really interesting because you're getting
Speaker:at the fact that words matter and what we
Speaker:call things matter, because I think there are a lot of
Speaker:really interesting organizational challenges with implementing
Speaker:SLOs and SLAs effectively. And one of them sort of on the
Speaker:flip side, is that when teams
Speaker:talk about service level objectives,
Speaker:they often sort of set them arbitrarily based
Speaker:on, okay, these are the things that I can measure and these are
Speaker:my objectives as the owner. And
Speaker:what you're getting at is the fact that these really need to be
Speaker:agreements. They need to be hashed out with product
Speaker:owners and technical leads and people who are
Speaker:deeply familiar with the customer, whether that customer is internal or
Speaker:external. Stay tuned for our next Modern Ops segment
Speaker:when Beth and I continue our discussion on modern application
Speaker:operations by talking about ownership in a modern
Speaker:operations world. Thank you for
Speaker:tuning in to Modern Digital Business. This podcast exists
Speaker:because of the support of you, my listeners. If you enjoy what you
Speaker:hear, will you please leave a review on Apple podcasts or
Speaker:directly on our website at MDB FM. Slash
Speaker:Reviews if you'd like to suggest a topic for an episode or
Speaker:you're interested in becoming a guest, please contact me directly by
Speaker:sending me a message at MDB FM contact.
Speaker:And if you'd like to record a quick question or comment, click the
Speaker:microphone icon in the lower right hand corner of our website.
Speaker:Your recording might be featured on a future episode. To
Speaker:make sure you get every new episode when they become available, click
Speaker:subscribe in your favorite podcast Player, or check out our website at
Speaker:MDB FM. If you want to learn more from me,
Speaker:then check out one of my books, courses or articles by going to Lee
Speaker:Atchison.com, and all of these links are included in the show.
Speaker:Notes. Thank you for listening and welcome to the world of the
Speaker:Modern Digital Business.