Latency (execution time): time to finish a fixed task
Throughput (bandwidth): number of tasks in fixed time
Different: exploit parallelism for throughput, not latency
Example: move people 10 miles
Car: capacity = 5, speed = 60 miles/hour
•Bus: capacity = 60, speed = 20 miles/hour
Latency: car = 10 min, bus = 30 min
Throughput: car = 15 PPH (count return trip), bus = 60 PPH
Consider following example
Low latency increase page/app speed, improve user experience, improve SEO.
Google confirmed that user move away from web page if page is not loaded into < 3 sec.
Ecommerce sites found that improving load speed increase conversion rate.
It means architect should work on reducing latency. Reducing latency means free up hardware and run fewer queries/jobs. It lead to under utilization and higher cost. A good architect should reduce latency as well as focus on optimal cost.
To do trade off between cost and latency - one will consider latency and how many queries can be run . However it will be mistake to consider latency as one number. Latency is different for different queries, different for same queries with different user etc.
Example if we run 5 queries and these take 99, 98, 100, 101, 1000 ms. Avg latency is around 280 ms. However most of queries can run in around 100 ms. Instead of using Avg latency one should consider distrubution of latency number. One way to do this is percentile, quartiles.
Instead of avg median may give more insight.
An architect can pay attention to median, architect can focus on 90th percentile etc and determine which queries are outliers.
Example - Architect may determine 1000ms query was due to failure, network issue or once in while. Architect can optimize latency/cost using 100ms .
Apply business context
Add following business context, 2-5% of queries take longer to execute. It happens as you ahve few large customers with large purchase history. Will you ignore and consider them outliers? From business context these are higher revenue generating customer. In this scenario architect should not ignore imporving experience of such customer. Architect need to use option that improve experience of such customer and longer running queries should be given preference