Statefulness
State should be moved as far back in the stack as possible to improve scalability. By replicating and sharding state on the back end, stability and reliability are improved.
Load balancers should be used for distributing incoming load to the frontend servers and then again mapping the load to the backend systems hosting the replicated state.
Measurements
Service Level Objectives are technical attributes describing the quality of the service which you want to achieve.
Performance Indicators are how you measure the objectives. They provide information on how close you are to the objective.
Service Level Agreements are the codification of the SLOs into legal documents.
Objectives should be used to guide the design. They can start off as estimates or as a range and get more specific and refined as the system evolves.
Objectives should be relevant to the user experience. Indicators measure that experience and alerts should be generated when the experience is noticable and causing the user pain.
Alerts are defined through the use of:
- Policies – define the condition underwhich the service is considered unhealthy
- Condition – determines when the policy triggers. Should be tuned to filter out false positives.
- Notifications – channels to inform that an action must be taken.