Monitoring and analyzing performance is an important task for any System Administrators.
Disk I/O bottlenecks can bring applications to a crawl. Some of the common questions for anyone embarking on a disk I/O analysis
- What are IOPS?
- Should I use SATA, SAS, or SSD?
- What RAID level should I use?
- Is my system read or write-heavy?
Disclaimer: I do not consider myself an expert in storage or anything for that matter. This is just how I have done I/O analysis in the past. I welcome additions and corrections
What are IOPS?
They are input-output (I/O) operations measured in seconds. IOPs are important for applications that require frequent access to the disk. Databases, version control systems, and mail stores any many more alike applications.
How do I calculate IOPS?
IOPS are a function of rotational speed (aka spindle speed), latency and seek time. The equation is simple, 1 / (seek + latency) = IOPS.
Sample drive:
Model: 2.5″ SATA hard drive Rotational speed: 10,000 RPM Average latency: 3 ms (0.003 seconds) Average seek time: 4.2 (r)/4.7 (w) = 4.45 ms (0.0045 seconds) Calculated IOPS for this disk: 1/(0.003 + 0.0045) = about 133 IOPS
It’s great to know how to calculate a disks IOPS but for the most part, you can get by with commonly accepted averages.
Rotational Speed (rpm) | IOPS |
5400 | 50-80 |
7200 | 75-100 |
10k | 125-150 |
15k | 175-210 |
Should I use SATA, SAS or SSD?
That is a loaded question. As with most things, the answer is “depends”.
Approximate IOPS and throughput values for different disk drive types:
Drive | IOPS | IOPS | MB/s | IOPS | MB/s | MB/s |
(Type / RPM) | (4KB block, random) | (64KB block, random) | (64KB block, random) | (512KB block, random) | (512KB block, random) | (large block, sequential) |
FC / 15K | 163 – 178 | 151 – 169 | 9.7 – 10.8 | 97 – 123 | 49.7 – 63.1 | 73.5 – 127.5 |
SAS / 15K | 188 – 203 | 175 – 192 | 11.2 – 12.3 | 115 – 135 | 58.9 – 68.9 | 91.5 – 126.3 |
FC / 10K | 142 – 151 | 130 – 143 | 8.3 – 9.2 | 80 – 104 | 40.9 – 53.1 | 58.1 – 107.2 |
SAS / 10K | 142 – 151 | 130 – 143 | 8.3 – 9.2 | 80 – 104 | 40.9 – 53.1 | 58.1 – 107.2 |
SAS/SATA / 7200 | 73 – 79 | 69 – 76 | 4.4 – 4.9 | 47 – 63 | 24.3 – 32.1 | 43.4 – 97.8 |
SATA / 5400 | 57 | 55 | 3.5 | 44 | 22.6 |
IOPS and throughput values for some SSD drives:
Model / Type | Capacity (GB) | Max Read IOPS | Max Write IOPS | Max Read Throughput | Max Write Throughput |
(4KB block, random) | (4KB block, random) | MB/s (sequential) | MB/s (sequential) | ||
Lightning Read-Intensive (MLC) | 1600 | 78000 | 8000 | 410 | 140 |
Lightning write-intensive (SLC) | 400 | 118000 | 33000 | 450 | 250 |
Lightning Mixed-Use | 800 | 100000 | 16000 | 450 | 220 |
What RAID level should I use?
How to calculate them and determined what kind of drives to use, the next logical question is commonly RAID 5 vs RAID 10.
Features | RAID 0 | RAID 1 | RAID 5 | RAID 6 | RAID 10 |
Minimum # Drives | 2 | 2 | 3 | 4 | 4 |
Data Protection | No Protection | Single-drive failure | Single-drive failure | Two-drive failure | Up to one disk failure in each sub-array |
Read Performance | High | High | High | High | High |
Write Performance | High | Medium | Low | Low | Medium |
Read Performance (degraded) | N/A | Medium | Low | Low | High |
Write Performance (degraded) | N/A | High | Low | Low | High |
Capacity Utilization | 100% | 50% | 67% – 94% | 50% – 88% | 50% |
Typical Applications | High-End Workstations, data logging, real-time rendering, very transitory data | Operating System, transaction databases | Data warehousing, web serving, archiving | Data archive, backup to disk, high availability solutions, servers with large capacity requirements | Fast databases, application servers |
Do you have an I/O bottleneck?
Your I/O wait measurement is the canary for an I/O bottleneck. I/O Wait is the percentage of time your processors are waiting on the disk.
For example, let us say it takes 1 second to grab 10,000 rows from MySQL and perform some operations on those rows. The disk is being accessed while the rows are retrieved. During this time, the processor is idle. It is waiting on the disk.
In the example above, disk access took 700 ms, so I/O wait is 70%.
You can check your I/O wait percentage via top, a command available on every flavor of Linux:
If your I/O wait percentage is greater than (1/# of CPU cores) then your CPUs are waiting a significant amount of time for the disk subsystem to catch up.
In the output above, I/O wait is 12.1%. This server has 8 cores (via cat /proc/cpuinfo). This is very close to (1/8 cores = 0.125). Disk access may be slowing the application down if I/O wait is consistently around this threshold.
What impacts I/O performance?
For random disk access (a database, mail server, file server, etc), you should focus on how many input/output operations can be performed per second (IOPS).
Four primary factors impact IOPS:
- Multi-disk Arrays– More disks in the array mean greater IOPS. If one disk can perform 150 IOPS, two disks can perform 300 IOPS.
- Average IOPS per-drive– The greater the number of IOPS each drive can handle, the greater the total IOPS capacity. This is largely determined by the rotational speed of the drive.
- RAID Factor– Your application is likely using a RAID configuration for storage, which means you are using multiple disks for reliability and redundancy.
- Read and Write Workload– If you have a high percentage of write operations and a RAID setup that performs many operations for each writes request (like RAID 5 or RAID 6); your IOPS will be significantly lower.
Three takeaways
- Disk access is slow– Disk access speeds do not come close to approaching RAM.
- Optimize your apps first– Tuning your disk hardware is not trivial or likely to be a quick fix. Try to have your I/O-heavy services read more data from a RAM cache first.
- Measure– Modifications to your application can have a big impact on Disk I/O. Record the key I/O metrics over time.