Earlier this afternoon I read an article about the demise of single-instance-storage in Exchange 2010’s underlying database. SIS was originally implemented by Microsoft to help reduce message delivery latency and the amount of expensive diskspace needed on a mail server. Times have changed; Exchange adopted SMTP for inter-server mail transfer along with new delivery methods, while the cost of storage has plumeted (price per GB at least although I suspect price per mailbox hasn’t as users now expect larger mailboxes than they did in 2001). This has meant that Microsoft’s adoption of a SIS model effectively became redundant with Exchange 2010, in fact worse happened, it began to hinder Exchange database design.
In 2001 when we were deploying large amount of storage on Windows servers for the first time the only credible storage available was copper or fibre connected SCSI hard drives. These had faster throughputs that the hard drives in our desktop PCs but were still better at high IO/s than high MB/s. Striping drives together boosted those numbers and for us DBAs who cared about performance we got the kind we generally liked. Exchange Server was forced to fit into this world too, the Exchange database engine was designed to support a high number of small transactions rather than a smaller number of long transactions. Fast forward to 2010 and we’re now seeing SATA filling up our server drive bays. Reliability is now sufficient to justify what are being called “cheap disks” and their throughput rates make them good for everything but high performance database systems. On top of this we’re now seeing individual drive capacities which a few years ago would have needed a whole drive shelf, 1TB on a single disk is now common in the server world.
So back to Exchange and this shift in drive buying habits means Exchange 2010 is most likely going to get installed on large SATA local drives than centralised fibre channel SANs, which makes it no surprise that Exchange 2010 has a range of new features to make this beneficial. Multiple databases per server increase the amount of email you can store on each instance, database availability groups mean the storage resilience comes from the application not the hardware and the retirement of SIS means internal database changes have reduced Exchange’s IO by 70%.
However, changes to get faster IO from the Exchange database actually caused it to use more storage, not a good incentive to make people want to upgrade! To get around this Microsoft incorporated some in-database compression of mail header data, yielding around a 20% space saving in return for some CPU cycles. This got me thinking about what compression functionality SQL Server offers.
SQL Server 2008 introduced a range of what I call “compression-within-the-table” features and I began to wonder when you’d want to use these. The answer became very clear when I looked at a) the licensing behind the features and b) the list of pre-deployment considerations; the data warehouse community. Storing huge amounts of rarely accessed data as cheaply as possible makes obvious sense for this type of data storage.
However, like any Microsoft enterprise database technology SQL Server’s compression has options and lots of them. Compression can be enabled on pages, tables, indexes or even a whole partition, there are compression specific performance monitors and lots of new options on CREATE statements. This is great news for bespoke deployments, if compression is going to be a tool to aid your application’s performance then you’ll be able to configure it to do exactly what you want, far nicer than a “compression on/off” tick box.
To conclude, if your workload is I/O bound then compression may help (fewer trips to the disk to get data and more pages in your cache), if your workload is CPU bound then you probably don’t want to be using any CPU cycles for anything but number crunching.
Updated new link here