8GB FC, 16GB FC, 10GBe iSCSI or 1GBe iSCSI. Which is right for your storage area network?

This is something that I’ve had a lot of first hand experience in, and it’s something that I’ve taken quite a bit of time to look in to as well. The answer to this question is going to be the basis of this write up, and if you don’t really want to read much further, I’m just going to say this: it really depends on your environment. There are pro’s and con’s to each of these, and we’ll hit each of them.

First off, if you are considering upgrading your main storage array, chances are that you are also going to be looking at an entirely new storage networking infrastructure. The reason for that is that things are evolving pretty fast in the storage area network world. The seemingly affordable 10GB ethernet is making a dent, 16GB fibre channel has hit the market (though not strictly affordable for a business our size) and storage arrays featuring each of these are a possibility.

My first major suggestion: do some heavy monitoring of your current environment. See where your peaks are and where your low times are. See how much storage bandwidth you are currently using. Watch your disk queues and see if there are reads or writes just sitting in the pipe waiting to get served up.

Let’s talk about bottlenecks for a second. Bottlenecks can happen at three main places: the server, the switch, or the array, and most bottlenecks happen either at the network or storage array. Sometimes they can be caused by network misconfiguration or by the actual disk in the storage array not being able to keep up with how fast you are requesting reads writes. Sometimes these bottlenecks can be misinterpreted by monitoring as well.

Storage

I’ll give you a quick inside tip: getting relatively high IOPs does NOT depend on the speed of your storage area network. Of course, you do need bandwidth for sustained transfer speeds (if you are doing large reads and writes), but if your traffic is bursty and requires some relatively high IO in short bursts (SQL Server comes to mind), you don’t need a lot of bandwidth. What you need is fast response time and fast IO. Now, that being said, how do you get high IO over… let’s say 1GBe? It’s all in how the array handles your IO.

Let’s say for example that you are disk bound (this meaning that the disks in your array just can’t keep up with the reads and writes, which is a fairly common issue among all spinning drive arrays, unless you have enough spindles to keep up with it or a fat read / write cache). Basically what this means is that as your servers are pushing out writes and trying to get reads and the disks just can’t keep up with how fast you’re trying to push / pull the data – this means that your storage bandwidth is not an issue – its the actual storage array that is having trouble filling that bandwidth – though monitoring may interpret that as bandwidth lag, because you’ll see your storage bandwidth being hit kind of hard because the network is waiting to get reads and writes.

Monitoring is essential, but it’s also VERY important to know how to interpret your monitoring – you need to monitor multiple places – your network, your servers, and your storage and be able to interpret all that data. Most ethernet and fibre channel switches have SNMP on them which can be used to monitor specific ports. ESXi has many types of monitoring you can use – from things like VMTurbo to Operations Manager. Using SNMP and a graphing program like LogicMonitor or Observium you can really drill down to the port level and see which servers are using a lot of bandwidth and / or storage utilization.

When you get a nice array on the back end, you’ll be shocked at how little bandwidth it actually takes to get pretty high IO, but monitoring, when selecting a storage area network, is your best friend. You need to know you current environment and not really listen to the salesperson. The salesperson is trying to sell you something and you are trying to make the best purchase for your environment. In this case, you need to know as much about your environment as possible so that you don’t spend a ton of money on things that you will under utilize.

There does need to be happy medium though between growth potential and your current bandwidth. If you are planning on growing to more servers and more IO, you need to plan accordingly.

Opsview – Installing Opsview Server on CentOS

If you are looking for a decent server hardware and software monitor – I have to recommend Opsview. The reason I suggest this is because of a few huge factors – and I’ll name those here. One is that they have a “free” community edition (and yes, it is free, but you get zero support from them and you have to host it yourself… oh and there is an annoying ad on top of the whole web based dashboard). If you have a halfway decent network and systems infrastructure, this is no problem. I’ll work through how to set it up here in a few.

In any event – it can monitor a huge set of information about each server and it supports Linux (though mostly the RPM based and APT based distros), Windows, and OSX . You can choose what you want to monitor (different hard drives, partition sizes and space, RAM, Windows services, Linux services, system performance, SQL Server, MySQL, CPU load, etc…). The best part is the mobile app which allows you to check out the status of your systems while out and about. That doesn’t exactly make it easier to fix any issues that arise – but at least you’ll be able to check out any issues that may arise.

Notifications are also a handy part of this application – for our little enterprise, we get email notifications when our set services are failing on each specific server. It lets you schedule downtime as well so if you want to stop being spammed while you are rebooting a server, you can just schedule some downtime for that particular unit.

So anyway, lets get to installing Opsview Core Server – after the server is installed I’ll show you how to install the clients / agents. If you are lazy and think setting it manually is too much trouble, you can just get the VMware Virtual Appliance (Ubuntu 10.04 Server, x86). I will be running you through how to set it up on a stock, newly created CentOS VM (this will work pretty close to the same on RHEL, and it will be the exact same on a physical machine). I don’t think there are any listed “system requirements” on their site, but here is the Dorkfolio.net system requirements:

CentOS 5.8 or 6.3

At least 2GB of RAM (I’ve tried this with 1 gig before and it didn’t end well. This box idles at about 1.1GB, so you should be good with 2).

At least 2 processors 

And at least 20GB of hard drive space.

A few other notes on this – this can either be dedicated server or a shared server. The actual management GUI is on port 3000, so it doesn’t interfere with web or MySQL, so if you want to make this a server that has some other function, by all means, just be sure to note that it does use about 1.1gigs of RAM just idling and when you are using the GUI, it also is a bit of a CPU hog.

Alright – Let’s get started shall we? I am going to assume here that you meet the specifications above and that you have a fully patched CentOS 5.8 or 6.3 box that you have root access to.

First we need to become root:

$ su

Enter in your credentials and now you are root. Next we need to add the OpsView Repository.

# cd /etc/yum.repos.d
# nano opsview.repo

In the opsview.repo file, paste this:

[opsview]
name = Opsview
baseurl = http://downloads.opsview.com/opsview-core/latest/yum/centos/$releasever/$basearch
enabled = 1
protect = 0
gpgcheck = 0

We now have the opsview repo installed. Now we can run

# yum list updates

It should come back empty – but now the repo will be up to date. Now we can go about actually installing the OpsView Server.

# yum install opsview

It will find all the dependencies for you (which, you’ll note includes MySQL Server – with OpsView Core, it has to be self hosted – as in you can’t host your MySQL Server on another box and use that, it has to be locally hosted – if you use OpsView Pro – you can use a different MySQL Server – but remember this is at a bit of an expense, if your MySQL Server goes down, OpsView also goes down.)

It will take a little bit to get up and running – depending on your hard drives / SAN / NAS / where ever you are installing it, how much CPU power it has, etc…

Once it is all installed, we now need to set up MySQL. I prefer the “Secure Setup” way, to I’d run this:

# /usr/bin/mysql_secure_installation

It will prompt you for a MySQL root password – make sure you remember this – we’ll need it. After it’s finished, we are just about all good with MySQL.

Next, Nagios (the foundation which OpsView is built on) tries to create a suitable environment for the server – it creates a new user called “nagios” We need to verify that it is set up correctly by running this:

# su - nagios
echo "test -f /usr/local/nagios/bin/profile && /usr/local/nagios/bin/profile" >> ~/.bash_profile
exit

If those top two commands work (which they should), we are all good. Next we need to edit a few config files.

# nano /usr/local/nagios/etc/opsview.conf

We need to change the two passwords that say “changeme” to the password we set as your MySQL root password.

Next we need to run a few Opsview scripts to bind OpsView to the MySQL Server and set up all the tables.

# /usr/local/nagios/bin/db_mysql -u root -p{MySQL root password}
# /usr/local/nagios/bin/db_opsview db_install && /usr/local/nagios/bin/db_runtime db_install

Those second two scripts may take a few moments to run and you may or may not get some warnings about UTF-8. You can ignore those.

The last thing we need to do is regenerate all the config files based on what those scripts ran. We can do that by running this:

# /usr/local/nagios/bin/rc.opsview gen_config

If this fails for some reason or either, it may be because the log is not writable by Nagios. I fixed this by doing this:

# chmod 777 /var/log/opsview/opsviewd.log

Once that is set up, we are basically done and we can now start the opsview-web service by running:

# service opsview-web start

We’ll also want to be sure opsview-web and mysqld start on boot

# chkconfig --level 345 mysqld on
# chkconfig --level 345 opsview-web on

Now you are all set up. If you open a browser on your OpsView server and punch in http://localhost:3000, you should be presented with a shiny new login screen.

To log in for the first time, the username is “Admin” and the password is “initial”. You’ll obviously want to change these straight away. The rest is all done via the handy GUI.

If you want to be able to access this on different machines, you’ll need to either have a FQDN or a static IP, with firewall open for port 3000. You can then access it by pointing your browser to http://ip_of_opsview_server:3000, or http://FQDN:3000.