This is something that I’ve had a lot of first hand experience in, and it’s something that I’ve taken quite a bit of time to look in to as well. The answer to this question is going to be the basis of this write up, and if you don’t really want to read much further, I’m just going to say this: it really depends on your environment. There are pro’s and con’s to each of these, and we’ll hit each of them.
First off, if you are considering upgrading your main storage array, chances are that you are also going to be looking at an entirely new storage networking infrastructure. The reason for that is that things are evolving pretty fast in the storage area network world. The seemingly affordable 10GB ethernet is making a dent, 16GB fibre channel has hit the market (though not strictly affordable for a business our size) and storage arrays featuring each of these are a possibility.
My first major suggestion: do some heavy monitoring of your current environment. See where your peaks are and where your low times are. See how much storage bandwidth you are currently using. Watch your disk queues and see if there are reads or writes just sitting in the pipe waiting to get served up.
Let’s talk about bottlenecks for a second. Bottlenecks can happen at three main places: the server, the switch, or the array, and most bottlenecks happen either at the network or storage array. Sometimes they can be caused by network misconfiguration or by the actual disk in the storage array not being able to keep up with how fast you are requesting reads writes. Sometimes these bottlenecks can be misinterpreted by monitoring as well.
I’ll give you a quick inside tip: getting relatively high IOPs does NOT depend on the speed of your storage area network. Of course, you do need bandwidth for sustained transfer speeds (if you are doing large reads and writes), but if your traffic is bursty and requires some relatively high IO in short bursts (SQL Server comes to mind), you don’t need a lot of bandwidth. What you need is fast response time and fast IO. Now, that being said, how do you get high IO over… let’s say 1GBe? It’s all in how the array handles your IO.
Let’s say for example that you are disk bound (this meaning that the disks in your array just can’t keep up with the reads and writes, which is a fairly common issue among all spinning drive arrays, unless you have enough spindles to keep up with it or a fat read / write cache). Basically what this means is that as your servers are pushing out writes and trying to get reads and the disks just can’t keep up with how fast you’re trying to push / pull the data – this means that your storage bandwidth is not an issue – its the actual storage array that is having trouble filling that bandwidth – though monitoring may interpret that as bandwidth lag, because you’ll see your storage bandwidth being hit kind of hard because the network is waiting to get reads and writes.
Monitoring is essential, but it’s also VERY important to know how to interpret your monitoring – you need to monitor multiple places – your network, your servers, and your storage and be able to interpret all that data. Most ethernet and fibre channel switches have SNMP on them which can be used to monitor specific ports. ESXi has many types of monitoring you can use – from things like VMTurbo to Operations Manager. Using SNMP and a graphing program like LogicMonitor or Observium you can really drill down to the port level and see which servers are using a lot of bandwidth and / or storage utilization.
When you get a nice array on the back end, you’ll be shocked at how little bandwidth it actually takes to get pretty high IO, but monitoring, when selecting a storage area network, is your best friend. You need to know you current environment and not really listen to the salesperson. The salesperson is trying to sell you something and you are trying to make the best purchase for your environment. In this case, you need to know as much about your environment as possible so that you don’t spend a ton of money on things that you will under utilize.
There does need to be happy medium though between growth potential and your current bandwidth. If you are planning on growing to more servers and more IO, you need to plan accordingly.