When most applications get deployed their databases start off at a moderate/normal/average/boring size, somewhere between 20MB and 5GB is not unusual. Some large systems such as Oracle’s E-Business Suite and SAP have larger starting points (100+GB) but new applications or web sites normally start off with relatively small databases which hold application config data and sit waiting for user data to begin pouring in post go-live.
These small and agile databases are great for DBAs, they not only require small amounts of disk space (even once t/log and backup dumps are done) but backups are also fast, copies to other servers easy and housekeeping tasks relatively painless. We also see efficient IO access by SQL Server and indexing isn’t so much of a problem – a pain free life!
Roll forward 2 years and we start seeing these small and agile databases become large and bloated. Web application databases have stored every quote a client requested for the last 2 years, application audit logs detailing every change go back 3 years etc etc. I’ve often heard tales from clients of their 2GB database becoming a 150GB database after a couple of years of “successful” use, but backups now take hours, database copies to dev systems can’t be done in a night anymore and the thought of re-indexing the complete database becoming complex.
The diagram below shows a strategy which some may want to consider when they consider their data retention policies.
Every company now has to keep every piece of data they generate for one reason or another, people just like keeping it in case they need it. But that doesn’t mean to say it has to be kept in the same physical data files or even database as the operational OLTP data. Splitting the data out can have operational benefits for the DBA, reducing the amount of data which needs to be backed up every night, the amount of data which needs to be transferred for dev systems to be refreshed as well as the amount of time needed to re-index etc.
There are two strategies to physically splitting these large amounts of data up: data level and storage level.
Splitting the data at a data level can involve archiving off old data into a second database which distances it from the operational OLTP tables. Application code will need to know how to find this new data but there is now both a logical and physical separation between the active and inactive data, it’s not all stored in a single million row table any more. The second database can still keep the data in the same table format as the original but benefit from its own indexing strategy and backup schedules.
Splitting the data at a physical level requires functionality which is relatively specific to your database platform. SQL Server has the concept of file groups which allows you to place specific tables on specific physical files. Again, some logical re-distribution of data and application re-direction maybe required however you can now physically separate your operation and archived data, but within the same database. File group level backups again help keep database backups and dev system refreshes swift. Oracle has similar functionality using tablespaces and data files, the ability to logical dictate where data is physically held.
This strategy may not be suitable for every application but if you built the application you should at least consider it. Your DBA will love you, backup times remain low, copying databases remains swift and housekeeping remains straightforward.