Theory
Aspects of Performance Tuning
Why does performance matter?
Gallino, Karacaoglu, & Moreno (2018) found that "a 10 percent decrease in website performance leads to a 2.6 percent decrease in retailers' revenue and a decrease of 0.05 percentage points in conversion, after controlling for traffic and a battery of fixed effects. [...] Delays of 100 milliseconds have a significant impact on customer abandonment."
A typical performance exercise can yield a throughput improvement of about 200% relative to default tuning parameters.
Indirect benefits of improved performance include reduced hardware needs and reduced costs, reduced maintenance, reduced power consumption, knowing your breaking points, accurate system sizing, etc.
Increased performance may involve sacrificing a certain level of feature or function in the application or the application server. The tradeoff between performance and feature must be weighed carefully when evaluating performance tuning changes.
Basic Definitions
In general, the goal of performance tuning is to increase throughput, reduce response times, and/or increase the capacity for concurrent requests, all balanced against costs.
- A response time is the time taken to complete a unit of work. For example, the time taken to complete an HTTP response.
- The number of concurrent requests is the count of requests processing at the same time over some fixed time interval (e.g. per second). For example, the number of HTTP requests concurrently being processed per second. A single user may send multiple concurrent requests.
- Throughput is the number of responses over some fixed time interval (e.g. per second). For example, successful HTTP responses per second.
- A hypothesis is a testable idea. It is not believed to be true nor false.
- A theory is the result of testing a hypothesis using evidence and getting a positive result. It is believed to be true.
Common Throughput Curve
A common throughput curve includes a saturation point and may include a buckle zone:
In the heavy load zone or Section B, as the concurrent client load increases, throughput remains relatively constant. However, the response time increases proportionally to the user load. That is, if the user load is doubled in the heavy load zone, the response time doubles. At some point, represented by Section C, the buckle zone, one of the system components becomes exhausted. At this point, throughput starts to degrade. For example, the system might enter the buckle zone when the network connections at the web server exhaust the limits of the network adapter or if the requests exceed operating system limits for file handles.
Response Time vs. Latency
Some define latency as a synonym for the response time (the time between a stimulus and a response), or as a subset or superset of the response time. Others define latency along a more strict and classical definition "concealed or inactive"; i.e., external to queue processing time (most commonly understood as transit or network time). This book prefers the latter definition (as detailed in practical queuing theory); although, in general, we try to avoid the word latency due to this ambiguity.
Architecture/Clustering
It is always important to consider what happens when some part of a cluster crashes. Will the rest of the cluster handle it gracefully? Does the heap size have enough head room? Is there enough CPU to handle extra load, etc.? If there is more traffic than the cluster can handle, will it queue and timeout gracefully?