Another aspect of resilience is how the company distributes responsibility among its employees. The company is highly dependent on developers to build the Simian army and cloud services. Every time a developer builds something, it is responsible for maintaining it. While this may sound like a „devops“ model – it`s the idea that developers provide their own infrastructure resources – Netflix instead takes what Tseilin calls a distributed op model. Each developer is responsible for the entire lifecycle of the codes they have created and the applications they have created. Developers write programs, run them and are responsible for updating them. Service Level Agreements (SLAs) may receive more attention, but service level targets (SLOs) and service level indicators (SLIs) are the most important for service control, availability, reliability and accessibility. This chapter was published in Site Reliability Engineering: How Google Runs Production Systems (2016) and highlights the need to define the expectations of users and teams with SLOs and SLIs. In particular, OLAS, which reflects service providers and consumer interests, help prioritizes work between SREs and developers who may decide to improve availability from 99.99% to 99.999% or instead focus on publishing new features, while maintaining the 99.99% availability target. Business-level services supported by SLAs, such as fibre-optic Or fixed Wireless Internet, aim to reach 99.99% or improve service connectivity. This means that the Internet service supports an average downtime of about 4 minutes or less per month. By comparison, a service with 99.9% operating time, which seems reliable until you do the math, can be down for 44 minutes.
Now back, for example, to the great Amazon outage. Most suppliers work with so-called ALS credits, which can be compensated in the event of damage in the event of a breakdown. These credits are similar to miles and can only be used as a discount on your monthly bill. Although these SLA credits are a negative compensation for major failures (damages can easily exceed the amount paid for using the cloud service), situations in which cloud service providers actually decide to grant these credits seem rare.