Achieving Rapid Response Times in Large Online Services
Video: https://www.youtube.com/watch?v=1-3Ahy7Fxsc
Large Fanout Services
Why Does Fanout Make Things Harder?
Overall latency ≥ latency of slowest component
–small blips on individual machines cause delays
–touching more machines increases likelihood of delays
• Server with 1 ms avg. but 1 sec 99%ile latency
–touch 1 of these: 1% of requests take ≥1 sec
–touch 100 of these: 63% of requests take ≥1 sec
One Approach: Squash All Variability
Careful engineering all components of system
• Possible at small scale
–dedicated resources
–complete control over whole system
–careful understanding of all background activities
–less likely to have hardware fail in bizarre ways
System changes are difficult
–software or hardware changes affect delicate balance
One Approach: Squash All Variability
Not tenable at large scale: need to share resources
Shared Environment
Huge benefit: greatly increased utilization
• ... but hard to predict effects increase variability
–network congestion
–background activities
–bursts of foreground activity
–not just your jobs, but everyone else’s jobs, too
• Exacerbated by large fanout systems
Basic Latency Reduction Techniques
Differentiated service classes
–prioritized request queues in servers
–prioritized network traffic
• Reduce head-of-line blocking
–break large requests into sequence of small requests
Manage expensive background activities
–e.g. log compaction in distributed storage systems
–rate limit activity
–defer expensive activity until load is lower
Synchronized Disruption
Large systems often have background daemons
–various monitoring and system maintenance tasks
• Initial intuition: randomize when each machine
performs these tasks
–actually a very bad idea for high fanout services
• at any given moment, at least one or a few machines are slow
• Better to actually synchronize the disruptions
–run every five minutes “on the dot”
–one synchronized blip better than unsynchronized
Tolerating faults:
– rely on extra resources
• RAIDed disks, ECC memory, dist. system components, etc.
– make a reliable whole out of unreliable parts
• Tolerating variability:
– use these same extra resources
– make a predictable whole out of unpredictable parts
Times scales are very different:
– variability: 1000s of disruptions/sec, scale of milliseconds
– faults: 10s of failures per day, scale of tens of seconds
Latency Tolerating Techniques
Cross request adaptation
–examine recent behavior
–take action to improve latency of future requests
–typically relate to balancing load across set of servers
–time scale: 10s of seconds to minutes
• Within request adaptation
–cope with slow subsystems in context of higher level
request
–time scale: right now, while user is waiting
Load Balancing
Can shed load in few percent increments
–prioritize shifting load when imbalance is more severe
Backup Requests w/ Cross-Server Cancellation
Video: https://www.youtube.com/watch?v=1-3Ahy7Fxsc
Large Fanout Services
Why Does Fanout Make Things Harder?
Overall latency ≥ latency of slowest component
–small blips on individual machines cause delays
–touching more machines increases likelihood of delays
• Server with 1 ms avg. but 1 sec 99%ile latency
–touch 1 of these: 1% of requests take ≥1 sec
–touch 100 of these: 63% of requests take ≥1 sec
One Approach: Squash All Variability
Careful engineering all components of system
• Possible at small scale
–dedicated resources
–complete control over whole system
–careful understanding of all background activities
–less likely to have hardware fail in bizarre ways
System changes are difficult
–software or hardware changes affect delicate balance
One Approach: Squash All Variability
Not tenable at large scale: need to share resources
Shared Environment
Huge benefit: greatly increased utilization
• ... but hard to predict effects increase variability
–network congestion
–background activities
–bursts of foreground activity
–not just your jobs, but everyone else’s jobs, too
• Exacerbated by large fanout systems
Basic Latency Reduction Techniques
Differentiated service classes
–prioritized request queues in servers
–prioritized network traffic
• Reduce head-of-line blocking
–break large requests into sequence of small requests
Manage expensive background activities
–e.g. log compaction in distributed storage systems
–rate limit activity
–defer expensive activity until load is lower
Synchronized Disruption
Large systems often have background daemons
–various monitoring and system maintenance tasks
• Initial intuition: randomize when each machine
performs these tasks
–actually a very bad idea for high fanout services
• at any given moment, at least one or a few machines are slow
• Better to actually synchronize the disruptions
–run every five minutes “on the dot”
–one synchronized blip better than unsynchronized
Tolerating faults:
– rely on extra resources
• RAIDed disks, ECC memory, dist. system components, etc.
– make a reliable whole out of unreliable parts
• Tolerating variability:
– use these same extra resources
– make a predictable whole out of unpredictable parts
Times scales are very different:
– variability: 1000s of disruptions/sec, scale of milliseconds
– faults: 10s of failures per day, scale of tens of seconds
Latency Tolerating Techniques
Cross request adaptation
–examine recent behavior
–take action to improve latency of future requests
–typically relate to balancing load across set of servers
–time scale: 10s of seconds to minutes
• Within request adaptation
–cope with slow subsystems in context of higher level
request
–time scale: right now, while user is waiting
Load Balancing
Can shed load in few percent increments
–prioritize shifting load when imbalance is more severe
Backup Requests w/ Cross-Server Cancellation