![]() ![]() Toil tends to fall on a spectrum measured by the following characteristics, which are described in our first book. While this target may not be appropriate for your organization, there’s still an advantage to placing an upper bound on toil, as identifying and quantifying toil is the first step toward optimizing your team’s time. ![]() ![]() ![]() Google limits the time SRE teams spend on operational work (including both toil- and non-toil-intensive work) at 50% (for more context on why, see Chapter 5 in our first book). These activities can quickly consume a team if left unchecked and unaccounted for. System maintenance inevitably demands a certain amount of rollouts, upgrades, restarts, alert triaging, and so forth. Toil is seemingly unavoidable for any team that manages a production service. For the purposes of this chapter, we’ll define toil as the repetitive, predictable, constant stream of tasks related to maintaining a service. For a comprehensive discussion of toil, see Chapter 5 in Site Reliability Engineering. Primarily, we want to avoid performing tasks classified as toil. But the scope of optimization isn’t limited to compute resources: it’s also important that SREs optimize how they spend their time. Google SREs spend much of their time optimizing-squeezing every bit of performance from a system through project work and developer collaboration. With Betsy Beyer, Max Luebbe, Alex Perry, and Murali Suriar Matthew Sartwell, Chris Coykendall, Chris Schrier, By David Challoner, Joanna Wijntjes, David Huska, ![]()
0 Comments
Leave a Reply. |