Understanding Disk I/O – when should you be worried?

5.00 avg. rating (95% score) - 1 vote

Monitoring and analyzing performance is an important task for any System Administrators.

Disk I/O bottlenecks can bring applications to a crawl. Some of the common questions for anyone embarking on a disk I/O analysis

  • What are IOPS?
  • Should I use SATA, SAS, or SSD?
  • What RAID level should I use?
  • Is my system read or write-heavy?

 

Disclaimer: I do not consider myself an expert in storage or anything for that matter. This is just how I have done I/O analysis in the past. I welcome additions and corrections

 

What are IOPS?

They are input-output (I/O) operations measured in seconds. IOPs are important for applications that require frequent access to the disk. Databases, version control systems, and mail stores any many more alike applications.

 

How do I calculate IOPS?

IOPS are a function of rotational speed (aka spindle speed), latency and seek time. The equation is simple, 1 / (seek + latency) = IOPS.

Sample drive:

Model: 2.5″ SATA hard drive
Rotational speed: 10,000 RPM
Average latency: 3 ms (0.003 seconds)
Average seek time: 4.2 (r)/4.7 (w) = 4.45 ms (0.0045 seconds)
Calculated IOPS for this disk: 1/(0.003 + 0.0045) = about 133 IOPS

It’s great to know how to calculate a disks IOPS but for the most part, you can get by with commonly accepted averages.

 

Rotational Speed (rpm) IOPS
5400 50-80
7200 75-100
10k 125-150
15k 175-210

 

Should I use SATA, SAS or SSD?

That is a loaded question. As with most things, the answer is “depends”.

 

Approximate IOPS and throughput values for different disk drive types:

Drive IOPS  IOPS  MB/s IOPS  MB/s MB/s
(Type / RPM) (4KB block, random) (64KB block, random) (64KB block, random) (512KB block, random) (512KB block, random) (large block, sequential)
FC / 15K 163 – 178 151 – 169 9.7 – 10.8 97 – 123 49.7 – 63.1 73.5 – 127.5
SAS / 15K 188 – 203 175 – 192 11.2 – 12.3 115 – 135 58.9 – 68.9 91.5 – 126.3
FC / 10K 142 – 151 130 – 143 8.3 – 9.2 80 – 104 40.9 – 53.1 58.1 – 107.2
SAS / 10K 142 – 151 130 – 143 8.3 – 9.2 80 – 104 40.9 – 53.1 58.1 – 107.2
SAS/SATA / 7200 73 – 79 69 – 76 4.4 – 4.9 47 – 63 24.3 – 32.1 43.4 – 97.8
SATA / 5400 57 55 3.5 44 22.6

 

IOPS and throughput values for some SSD drives:

Model / Type Capacity (GB) Max Read IOPS  Max Write IOPS  Max Read Throughput  Max Write Throughput 
(4KB block, random) (4KB block, random) MB/s (sequential) MB/s (sequential)
Lightning Read-Intensive (MLC) 1600 78000 8000 410 140
Lightning write-intensive (SLC) 400 118000 33000 450 250
Lightning Mixed-Use 800 100000 16000 450 220

 

What RAID level should I use?

How to calculate them and determined what kind of drives to use, the next logical question is commonly RAID 5 vs RAID 10.

Features RAID 0 RAID 1 RAID 5 RAID 6 RAID 10
Minimum # Drives 2 2 3 4 4
Data Protection No Protection Single-drive failure Single-drive failure Two-drive failure Up to one disk failure in each sub-array
Read Performance High High High High High
Write Performance High Medium Low Low Medium
Read Performance (degraded) N/A Medium Low Low High
Write Performance (degraded) N/A High Low Low High
Capacity Utilization 100% 50% 67% – 94% 50% – 88% 50%
Typical Applications High-End Workstations, data logging, real-time rendering, very transitory data Operating System, transaction databases Data warehousing, web serving, archiving Data archive, backup to disk, high availability solutions, servers with large capacity requirements Fast databases, application servers

 

Do you have an I/O bottleneck?

Your I/O wait measurement is the canary for an I/O bottleneck. I/O Wait is the percentage of time your processors are waiting on the disk.

 

For example, let us say it takes 1 second to grab 10,000 rows from MySQL and perform some operations on those rows. The disk is being accessed while the rows are retrieved. During this time, the processor is idle. It is waiting on the disk.

 

In the example above, disk access took 700 ms, so I/O wait is 70%.

You can check your I/O wait percentage via top, a command available on every flavor of Linux:

If your I/O wait percentage is greater than (1/# of CPU cores) then your CPUs are waiting a significant amount of time for the disk subsystem to catch up.

 

In the output above, I/O wait is 12.1%. This server has 8 cores (via cat /proc/cpuinfo). This is very close to (1/8 cores = 0.125). Disk access may be slowing the application down if I/O wait is consistently around this threshold.

What impacts I/O performance?

For random disk access (a database, mail server, file server, etc), you should focus on how many input/output operations can be performed per second (IOPS).

Four primary factors impact IOPS:

  • Multi-disk Arrays– More disks in the array mean greater IOPS. If one disk can perform 150 IOPS, two disks can perform 300 IOPS.
  • Average IOPS per-drive– The greater the number of IOPS each drive can handle, the greater the total IOPS capacity. This is largely determined by the rotational speed of the drive.
  • RAID Factor– Your application is likely using a RAID configuration for storage, which means you are using multiple disks for reliability and redundancy.
  • Read and Write Workload– If you have a high percentage of write operations and a RAID setup that performs many operations for each writes request (like RAID 5 or RAID 6); your IOPS will be significantly lower.

Three takeaways

  • Disk access is slow– Disk access speeds do not come close to approaching RAM.
  • Optimize your apps first– Tuning your disk hardware is not trivial or likely to be a quick fix. Try to have your I/O-heavy services read more data from a RAM cache first.
  • Measure– Modifications to your application can have a big impact on Disk I/O. Record the key I/O metrics over time.