@BrentO (http://bit.ly/azHNly) posted a link earlier about the “Top 13 SQL Server Mistakes” which made me think that in my role as a solution architect I often see the same fundamental infrastructure issues causing clients the same problems.
While not directly related to SQL Server I suspect people in SQL Server environments will often come across these issues, if not then they can at least hopefully understand some of the challenges at the other end of their TDS connection.
Some of these issues happen because “they never thought the application would get that big/important”, “because that’s how the application was originally written 10 years ago” or worse “we never thought that could be a problem so we didn’t bother with that”. The key takeaway is that they continue to happen but could be so easy to plan against if only people remembered.
Firewall throughput limitations
Secure application infrastructures will almost always use a firewall to control network traffic between the different tiers, e.g. web servers to database server. Fortunately most firewalls are either so fast or have such little workload that we never consider their performance limitations. In most busy environments you’re more likely to hit the concurrent connections per second limit however modern firewalls often have UTM services such as IPS or deep inspection. Enabling these can often have a huge impact on the device’s throughput, for example enabling IPS on a Juniper SRX reduces its throughput to a sixth of its native capability. If your environment starts off with one of the smaller models in the range remember there’s a reason why they also sell larger versions….
Non-Load Balancer friendly applications
At some point in the life of your business critical application you’re likely to need to use load balancing to scale out web or application servers; either to increase performance or to protect against planned and un-planned downtime. If you need to deploy additional load balanced servers to give you more performance it could well be in a hurry. Then is not the time to find out your application is not load balancer friendly. Moving to a multi-server environment can often break session persistence and cookie handling for applications that always thought there’d only ever be one of them. Test and plan for load balanced applications even if you don’t intend to deploy them for a long time, your Christmas peak period is not the time to re-write your session handling!
Dev and Test not being technical representation of Production
Having dev and test environments is crucial in order to test code updates and patches for your critical applications, however somewhere in your environments you should also have replicas of the infrastructure your platform will use. Its likely Prod will have other services which could affect your application’s functionality: load balancing, clusters, web application firewalls, intrusion prevention, SAN storage instead of local disk etc. These components might be expensive to replicate but the potential cost of Prod downtime due to an un-tested component conflict could be far more.
Insufficient disk space for a restore of the Prod database
There are often times when a restore of the production database needs to be done in a hurry to find a missing database object or compare tables from a point in time. While it’s best to do this outside of the production environment needs sometimes must. Having sufficient disk space available for an emergency restore can often be invaluable at 1am when everything else has been tried. Some may say that their Prod databases are too large to have spare space for a restore but I suspect for most of us that’s not the case.
Hitting the virtualisation ceiling sooner than expected
Virtualising servers can be great, giving more flexible resource management to your estate along with built-in HA features can be a real benefit. However, virtualisation technologies can have limits. This could be the amount of CPU power assigned to the VM, the amount of SAN IOPs available to the virtual platform or the network configuration being incompatible with your hardware load balancers etc. In situations like this your option might be to revert to traditional physical servers. These might give you the options you need but do you have the rack space for them? The build images for them? Again, performance issues often happen at peak periods with little time to think of a plan. Plan for panic, test to avoid it!
Part two will cover:
- Synchronous dependencies on remote applications
- No central authentication services
- Hard-coded IP addresses
- Security policies created from a allow all not a deny all
- No load testing