Quantcast
Viewing all articles
Browse latest Browse all 10

Flash Storage Devices (and how slow they are)

There’s been a lot of press in the past year or two about SSDs, and how they are revolutionising hard disk performance for servers, laptops and desktops (at a price).  In the embedded world, however, we’re pretty much stuck with SDHC cards and USB thumb drives, which are both built down to a price rather than up to a benchmark.  After being bitten by ludicrously long write-a-rootfs times once too often, I thought I’d investigate the problem in detail.

The source of the problem is that SDHC cards and USB thumb drives are both designed to be as cheap as possible, while still offering good performance for typical FAT16 or FAT32 implementations when writing “large” files – ie. photos at 1MB each or videos at somewhere above that.  But when developing for Linux, we don’t use FAT, we use ext2 or ext3 (or, when we can get it, ext4).  These have very different layout and access patterns than FAT.  FAT puts all it’s metadata together at the start of the disk, and then writes files in (usually) 32KB aligned clusters, and in a typical camera implementation these are expected to be contiguous.  Ext2 (from which ext3 and ext4 are very closely derived) puts metadata in groups spread across the disk, close to the data – which means that a large file will be broken up periodically by the gaps for the metadata, and then the metadata itself is broken up into small pieces.  Furthermore, it turns out that the default allocation algoritm for ext2 leaves small gaps between files – ext4 improves this greatly.

SSDs can cope with ext2’s idiosyncracies, because they are also designed to cope with NTFS and Windows’ foibles, and manufacturers have the budget to do so.  (Windows, or at least XP, is extremely bad at keeping files contiguous under normal desktop use.)  Basic SDHC and thunb drives, though, do not.  Ext2’s access pattern heavily exercises mechanisms which, in these cheaper devices, are designed to cope with corner cases rather than daily use.

I should point out that for most of these cheap devices, read performance is not a problem.  Both random and sequential read performance is as good as it should be.  They are simply not optimised for good write performance, except when it conforms to a very specific pattern.

The first step was to identify whether any of the many filesystems available for Linux performed better.  The best performer turned out to be ext4 with the default 4K block size, which writes most files contiguously and without gaps between them, and has a relatively large interval (128MB) between metadata groups.  LogFS came reasonably close in speed, but was too unstable to consider actually using.  ReiserFS got one of the worst performances – but I think it is designed for read performance, not write performance (and is therefore practically obsolete with SSDs).

You might wonder why I keep mentioning gaps between files as a problem.  This is because the data already in such a gap has to be preserved by the drive, because it has not been told otherwise.  This complicates the write process, and the best optimisations for the write process also complicate the read process.  A magnetic hard disk can simply overwrite the physical data in place, but a Flash drive must erase and write separately, so it puts the new data in a fresh eraseblock and, once all the valid data has been recovered from the old block, erases it for reuse.  These eraseblocks are typically large, on the order of a megabyte.

So I wrote a small benchmark in C, which attempts to exercise a device in a number of different and realistic ways, the idea being to get more reliable and repeatable results than simply formatting and untarring onto a drive.  First the entire drive is filled sequentially with zeroes, and then again with random data (from a high-speed, high-quality RNG), with the speed of both being recorded (significant differences here are possible for drives that use compression or de-duplication, like the Sandforce controller).  Then a random-access write benchmark is run, using single 512-byte sectors of random data, in an attempt to stress the worst-case performance.  Then I simulate formatting and then filling an ext2 filesystem with a 1K block size and an approximation of the default allocator (putting metadata in the same group as the start of the data, and leaving 1 block free between files) and a bias towards very small files (as is usually found in a Linux rootfs).

The results from a sample of devices are shown below.

Drive Performance claims Sequential (zero) MB/s Sequential (random) MB/s Random access ms Filesystem format groups/s Filesystem fill MB/s
USB Thumb Drives
Kingston 4GB 100 (orig.) 3.84 3.85 537 1.42 3.31
Kingston 16GB 100 (orig.) 6.70 6.05 491 1.82 2.66
Kingston 16GB 100 G2 5MB/s write 14.15 14.12 489 2.16 3.10
Kingston 4GB 410 8MB/s write 13.29 13.36 211 3.84 8.23
Corsair Voyager 16GB (blue) 9.26 9.26 826 1.08 3.16
Sandisk Cruzer 16GB 5.00 5.01 2.6 1.03 2.37
Verbatim 8GB 9.73 9.80 447 1.97 3.60
USB-adapted SATA drives
Seagate Momentus 160GB 7200rpm 27.58 27.42 21.5 141 5.07
USB-adapted SDHC cards
Transcend “Class 6” 4GB 6MB/s write 12.36 11.77 107 8.38 7.01
Transcend “Class 10” 4GB 10MB/s write 16.65 8.99 368 2.48 4.65
Transcend “Class 6” 32GB 6MB/s write 18.93 18.83 695 1.25 3.12
Sandisk “Class 2” 4GB 2MB/s write 6.14 6.12 203 4.55 5.58

As you can see, there is usually a wild difference between performance claims on the packaging and what is actually achievable, and there is also a great distinction between the sequential performance you can measure with dd and the performance you get when actually using the drive.  Given purely sequential writes, a thumb or SDHC drive will usually substantially outperform the claim; with a real filesystem though, it will almost always fall short (honourable exceptions being the 4GB Transcend Class 6, and the Kingston 410).

There are also some strange anomalies in the data.  The mechanical hard disk performed considerably worse in the filesystem-fill test than I had any right to expect, especially since writing to it using a real filesystem gives at least twice the performance shown here (though the formatting process takes several times as long as my simulated version).  Meanwhile, the Sandisk drive showed a very high random-access performance (cheers!) but a very poor filesystem-format performance (boo!), the latter involving 64KB chunks of data at regular 8MB intervals.  The filesystem-fill performance of this drive fits the latter metric better than the former, suggesting that Sandisk has optimised for a benchmark rather than real performance.

Another oddity I noticed during testing is that newer Linux kernels perform much better than even moderately old ones with these drives.  The numbers shown above are from a bang-up-to-date 2.6.36.2 kernel, but the 2.6.27 kernel on my Ubuntu workstation did considerably worse, especially with the otherwise excellent Kingston 410.  If you need an excuse to upgrade your kernel…  here it is.

EDIT: I updated the table with three more devices that I tested. Two of these demonstrate conclusively that the “class number” on an SD card should be taken with a large dose of salt – a Class 2 device manages to outperform a newer Class 10 device on realistic access patterns.

The Class 10 device is also interesting because it appears to have data-dependent performance – the sequential write speed on random data is consistently around 9MB/s, which is less than the performance claim implied by the class number. JPEG and MJPEG data (as produced by typical cameras) is usually not easily compressible, so this is a very odd feature for an SD card.


Viewing all articles
Browse latest Browse all 10

Trending Articles